I have 2 projects that I'm looking to eventually adapt into a database backend that's API compatible with RocksDB (with enhancements!). The first of which is a Extendible Hashing Implementation in Rust (it was my first attempt at Rust, so it's kinda messy): https://github.com/chiefnoah/MehDB
It achieves very promising performance (~1m writes/s, ~5m reads/s) for u64 sized types, which will eventually be an offset into a log. The core concept here is that it's O(1) for all inserts and queries with bounded database reads. The performance characteristics are not favorable for small datasets, but very favorable for large ones. However I have to use hash digests as pseudokeys for values, which will always have the potential for collisions. To get around this, I plan to use 128b or larger (256b) hash sizes. Right now it's just u64 for simplicity.
This one is implemented in Python for fast iteration, as I realized I wasn't happy with how fast I could iterate with Rust. This one is, IMO, a more complete approach towards full historical query-capable systems. I'm slowly chipping away at it, though I haven't had progress lately. I spend no real money to host them, just the code, though I'm certain I've shortened the life of my NVMe drives due to writing and rewriting large files for testing.
My ultimate goal with these is to build a general purpose KV store that can query the entire state of the system at a given point in time (either timestamp, or a TX increment) for the purposes of enhancing graph databases for temporal analysis.
It achieves very promising performance (~1m writes/s, ~5m reads/s) for u64 sized types, which will eventually be an offset into a log. The core concept here is that it's O(1) for all inserts and queries with bounded database reads. The performance characteristics are not favorable for small datasets, but very favorable for large ones. However I have to use hash digests as pseudokeys for values, which will always have the potential for collisions. To get around this, I plan to use 128b or larger (256b) hash sizes. Right now it's just u64 for simplicity.
The other is a similar concept using modified B+Trees that have subtrees for all writes to a record: https://github.com/chiefnoah/hist-prototype
This one is implemented in Python for fast iteration, as I realized I wasn't happy with how fast I could iterate with Rust. This one is, IMO, a more complete approach towards full historical query-capable systems. I'm slowly chipping away at it, though I haven't had progress lately. I spend no real money to host them, just the code, though I'm certain I've shortened the life of my NVMe drives due to writing and rewriting large files for testing.
My ultimate goal with these is to build a general purpose KV store that can query the entire state of the system at a given point in time (either timestamp, or a TX increment) for the purposes of enhancing graph databases for temporal analysis.