How is this different from backtracking? You're doing a depth-first search over possible interpretations. The grammar is just expressed in the type system instead of usual spec formats.
Critiques in other comments are accurate. This is a tooling nightmare, but also probably a nightmare to read. Consider an expression like
2026 March 10 to 13
What's the binding precedence? Does this mean March 10 through March 13, or midnight to 1 PM on March 10th? I think this breaks down outside of trivial examples that are better achieved in other ways.
Yes, technically this is a form of backtracking, similar to what a parser does. The key difference is that the search is drastically constrained by the type system: reductions are only attempted where the types actually support a binding operator. Unlike a parser exploring all grammar possibilities, this mechanism prunes most candidates automatically, so the compiler efficiently "solves" the expression rather than blindly exploring every syntactic alternative.
Here is the high-level explanation of the mechanism:
But the short answer is that it’s not parser-style backtracking over a grammar.
The Java parser still produces a normal AST for the sequence of tokens. What happens afterward is a type-directed binding phase where adjacent expressions may bind if their types agree on a binding operator. The compiler effectively reduces the expression by forming larger typed expressions until it reaches a stable form.
The algorithm favors left associativity, but since a type can implement the binding operator as either the left or right operand, the overall structure of the expression can emerge in different ways depending on the participating types.
So rather than exploring grammar productions, the compiler is solving a set of type-compatible reductions across the expression.
The meaning is therefore determined entirely by which types participate in binding e.g., `LocalDate`, `Month`, `Integer`, `Range`, etc. and which reductions they define.
If a competing interpretation exists but the types don’t support the necessary bindings, it simply never forms.
In that sense it behaves less like a traditional parser and more like a typed reduction system layered on top of the Java AST.
You avoid an unnecessary copy. Normal read system call gets the data from disk hardware into the kernel page cache and then copies it into the buffer you provide in your process memory. With mmap, the page cache is mapped directly into your process memory, no copy.
All running processes share the mapped copy of the file.
There are a lot of downsides to mmap: you lose explicit error handling and fine-grained control of when exactly I/O happens. Consult the classic article on why sophisticated systems like DBMSs do not use mmap: https://db.cs.cmu.edu/mmap-cidr2022/
I've never had to use mmap but this is always been the issue in my head. If you're treating I/O as memory pages, what happens when you read a page and it needs to "fault" by reading the backing storage but the storage fails to deliver? What can be said at that point, or does the program crash?
If you fail to load an mmapped page because of an I/O error, Unix-like OSes interrupt your program with SIGBUS/SIGSEGV. It might be technically possible to write a program that would handle those signals and recover, but it seems like a lot more work and complexity than just checking errno after a read system call.
If an I/O error happens with read()/write(), you get back an error code, which SQLite can deal with and pass back up to the application, perhaps accompanied by a reasonable error message. But if you get an I/O error with mmap, you get a signal. SQLite itself ought not be setting signal handlers, as that is the domain of the application and SQLite is just a lowly library. And even if SQLite could set signal handlers, it would be difficult to associate a signal with a particular I/O operation. So there isn't a good way to deal with I/O errors when using mmap(). With mmap(), you just have to assume that the filesystem/mass-storage works flawlessly and never runs out of space.
SQLite can use mmap(). That is a tested and supported capability. But we don't advocate it because of the inability to precisely identify I/O errors and report them back up into the application.
> The operating system must have a unified buffer cache in order for the memory-mapped I/O extension to work correctly, especially in situations where two processes are accessing the same database file and one process is using memory-mapped I/O while the other is not. Not all operating systems have a unified buffer cache. In some operating systems that claim to have a unified buffer cache, the implementation is buggy and can lead to corrupt databases.
What are those OSes with buggy unified buffer caches? More importantly, is there a list of platforms where the use of mmap in sqlite can lead to data loss?
I know that the spirit of HN will strike me down for this, but sqlite is not a "sophisticated system". It assumes the hardware is lawful neutral. Real hardware is chaotic. Sqlite has a good reputation because it is very easy to use. In fact this is the same reason programmers like mmap: it is a hell of a shortcut.
I think the main thing is whether mmap will make sqlite lose data or otherwise corrupt already committed data
... it will if two programs open the same sqlite, one with mmap, and another without https://www.sqlite.org/mmap.html - at least "in some operating systems" (no mention of which ones)
> The operating system must have a unified buffer cache in order for the memory-mapped I/O extension to work correctly, especially in situations where two processes are accessing the same database file and one process is using memory-mapped I/O while the other is not. Not all operating systems have a unified buffer cache. In some operating systems that claim to have a unified buffer cache, the implementation is buggy and can lead to corrupt databases.
Sqlite is otherwise rock solid and won't lose data as easily
* You want your program to crash on any I/O error because you wouldn't handle them anyway
* You value the programming convenience of being able to treat a file on disk as if the entire thing exists in memory
* The performance is good enough for your use. As the article showed, sequential scan performance is as good as direct I/O until the page cache fills up *from a single SSD*, and random access performance is as good as direct I/O until the page cache fills up *if you use MADV_RANDOM*. If your data doesn't fit in memory, or is across multiple storage devices, or you don't correctly advise the OS about your access patterns, mmap will probably be much slower
To be clear, normal I/O still benefits from the OS's shared page cache, where files that other processes have loaded will probably still be in memory, avoiding waiting on the storage device. But each normal I/O process incurs the space and time cost of a copy into its private memory, unlike mmap.
One reason to use shared memory mmap is to ensure that even if your process crashes, the memory stays intact. Another is to communicate between different processes.
Mad Max Middle-earth: Shadow of War Deus Ex: Mankind Divided Yakuza: Like a Dragon