"Sequential synchronous random writes" means that you issue writes with random k...

hyc_symas · on Sept 1, 2015

OK, for your sequential synchronous random write case, LMDB is fair-to-middling. In fully-synchronous mode (which is the default) it takes 2 fsyncs per transaction, so at least 2 IOPS per write. Many other DBs (including SQLite, Berkeley, and Kyoto) take even more than 2 fsyncs per transaction, and are slower than LMDB. Some others take only 1, and are much faster in this workload.

LMDB outperforms all other engines on larger-than-RAM DBs with page-sized objects because its zero-copy architecture doesn't waste any RAM. For other DBs you must dedicate some fraction of RAM to write buffers and page caches in order to obtain their maximum performance. This means the largest working set size they can keep in RAM is much smaller than the actual physical RAM. With LMDB there are no separate write buffers or caches, all of available RAM can be utilized for your working set. This is precisely the scenario tested here http://symas.com/mdb/hyperdex/

I'll note that in that test, LMDB performed at exactly the disk read seek rate; HyperLevelDB's LSM performed at much worse than that rate.

colanderman · on Sept 1, 2015

Thanks for the explanation.

Last I tested, Berkeley DB issues only one fsync per transaction commit (after writing to the log). (Although Berkeley DB screws up even this, by writing unaligned log entries of arbitrary size, which means e.g. SSDs have to perform expensive read-modify-writes.)

The interesting case for LSM isn't random read or sequential insert, it's random insert (which the 20/80 update/read load doesn't really cover). While it's good to see that LMDB outperforms LevelDB in the former two cases, one would be foolish to use a LSM database if random insert (and/or delete) weren't important to them. (Of course the world has no shortage of fools.)

Throw a larger-than-RAM rotary-disk-backed random insert workload at LevelDB vs. LMDB. I would expect LevelDB to perform at around 1/4 the disk's write bandwidth (probably ~2k IOPS with 4 KiB items; same as for sequential insert), and LMDB to perform at the disk's seek rate (~100 IOPS).

Please note I'm not trying to bash LMDB, or claim that LevelDB or LSM in general is superior for anything but random insert workloads. But you made an extraordinary claim, that LMDB is leaps and bounds better than any other data storage system! – as someone whose job it has been to wrangle extreme performance out of certain databases, it behooves me to ask for extraordinary evidence :)

rakoo · on Sept 1, 2015

You should really take a look at LMDB's benchmarks then (http://symas.com/mdb/#bench). The one of interest to you should be the microbenchmarks (http://symas.com/mdb/microbench/) inspired I believe by those from LevelDB.

There are many modes being compared (synchronous/asynchronous, sequential/random reads/writes, small/big values, in-memory/SSD/HDD). The one you want (synchronous, random writes, HDD) shows this:

- Small values (100B): Leveldb is at 1291 ops/sec, LMDB is at 297 ops/sec.

- Large values (100kiB): Everyone is at ~115 ops/sec

(apply portion of salt as usual, of course)