Store IDs in MongoDB as binary or as string?

I was curious if MongoDB compression can efficiently store IDs if they are represented as string instead in a more compact binary form. So I made a benchmark and measure compression performance of three available compressors: zlib, snappy, and zstd.

Results for MongoDB 4.4.1 of storing 128 bit random values (e.g., UUIDs) as binary (16 bytes) or as base-58 encoded (22 characters):

Binary none String none Binary snappy String snappy Binary zlib String zlib Binary zstd String zstd
size 3100000 3697229 3100000 3697196 3100000 3697150 3100000 3697196
count 100000 100000 100000 100000 100000 100000 100000 100000
avgObjSize 31 36 31 36 31 36 31 36
storageSize 3645440 4243456 2404352 3022848 2142208 2203648 1892352 2080768
totalIndexSize 2523136 3325952 2519040 3330048 2519040 3334144 2531328 3330048
totalSize 6168576 7569408 4923392 6352896 4661248 5537792 4423680 5410816

zstd compression looks really good. Moreover, it is clear that storing values as binary is more efficient than as string, even with compression, because compression can compress also binary representation despite values being random. The most compressed string size (zstd, 5410816 B) is still larger than the least compressed binary size (snappy, 4923392 B). Do note though that zlib compressed string (5537792 B) and zstd compressed string (5410816 B) are smaller than uncompressed binary (6168576 B), meaning that those compression algorithms can recover storage lost in string representation. But given that they can compress binary values even more, it seems there are still things to improve in those algorithms.

Note: Compression algorithms generally perform poorly on small data and here we had very small object sizes. This means that insights here cannot be generalized to performance with larger amounts of binary (or string) data stored in MongoDB. (MongoDB does combine objects into blocks to compress to alleviate this issue.)

If you have any relevant link, project, idea, comment, feedback, critique,
language fix, or any other information, please share it bellow. Thanks.
Subscribe