> we need to implement a caching scheme to at least amortize the overhead of thi...

rrnewton · on Nov 23, 2022

Yeah, it's interesting to think about persisting the state we would need to make the file system more sympatico with Hermit. If we were willing to have a daemon.... Meta develops this "watchman" tool that our build infrastructure uses. I think for existing file systems we could imagine a daemon that watches the directory and caches what we need.

But if we could dream of new file systems, then I want one that is basically btrfs but with O(1) file-level parallel-hashes (rather than block level). Heck maybe it could even enforce a sorted order on directories for us too. The hashes would be very useful in record-and-replay scenarios, where you could know immediately whether the input files are in your content-addressible blob storage without having to hash them every time.

(We have some hash support in Meta's EdenFS FUSE file system https://github.com/facebook/sapling)

P.S. about Reproducible Builds -- we pitched this to the community in 2019 (at the Reproducible Builds Summit) and with that ASPLOS'20 paper, but we would be eager to see anyone wanting to pick it up and take it further.