> So it’s for cases where you have any key but associated with one of only (preferably few) discrete values
We use it for a case with ~million unique values, but it's certainly more space efficient for cases where you have tens, hundreds, or thousands of values. The "Space Requirements" section has a few examples: https://github.com/onecodex/rust-bfield?tab=readme-ov-file#s... (e.g., you can store a key-value pair with 32 distinct values in ~27 bits of space at a 0.1% false positive rate).
> all the docs say “designed for in-memory lookups”
We use mmap for persistence as our use case is largely a build-once, read many times one. As a practical matter, the data structure involves lots of random access, so is better suited to in-memory use from a speed POV.
> fyi, you use temp::temp_file() but never actually use the result, instead using the hard-coded /tmp path
Sure, but I wouldn’t expect the api to force you to use an mmap when a slice of bytes would accomplish the same when unpersisted (and the user could choose to persist via a different mechanism if you have a .into() method that decays self into a Vec<u8>/Box<[u8]>/etc)
If I were to design this library, I would internally use an enum { Mapped(mmap), Direct(Box<[u8]>) } or better yet, delegate access and serialization/persistence to a trait so the type becomes BField<Impl> where the impl trait provides as_slice() and load()/save().
This way you abstract over the OS internals, provide a pure implementation for testing or no_std, and probably improve your codegen a bit.
We use it for a case with ~million unique values, but it's certainly more space efficient for cases where you have tens, hundreds, or thousands of values. The "Space Requirements" section has a few examples: https://github.com/onecodex/rust-bfield?tab=readme-ov-file#s... (e.g., you can store a key-value pair with 32 distinct values in ~27 bits of space at a 0.1% false positive rate).
> all the docs say “designed for in-memory lookups”
We use mmap for persistence as our use case is largely a build-once, read many times one. As a practical matter, the data structure involves lots of random access, so is better suited to in-memory use from a speed POV.
> fyi, you use temp::temp_file() but never actually use the result, instead using the hard-coded /tmp path
Thank you, have opened an issue and we'll fix it!