there are no good modern compiler books - everything that's been written down pales in comparison to what GCC/LLVM really involve. recently i found Engineering a Compiler by Cooper and Torczon when reviewing/prepping for interviews - it wasn't bad. also there's now LLVM Code Generation by Quentin Colombet but that's basically a code walk-through of LLVM (it doesn't cover any of the algos). and it was probably out of the date the second it got published lol (not really but maybe). the truth is that trying to learn how to build a compiler from a single book is like trying to learn how to build a skyscraper from a single book.
> the truth is that trying to learn how to build a compiler from a single book
I think you conflate “learning to build a compiler for a toy language” with “being effective at working on a modern optimizing compiler suite like GCC/LLVM”
The book is perfectly fine for the first use case, and never claims to touch upon the latter.
IMHO absolutely. The basics of lexer and parser are still there. Some of the optimizations are also relevant. You just cannot expect to read the book and be able to write GCC or LLVM from scratch(1).
For learning deeper about other advanced topics there is:
So maybe writing a compiler with exactly one FE (for a simple language) and one BE (for a simple architecture), with say 80% of the optimizations could be a doable project.
(1) We should define what we mean by that, because there are thousands of front-ends and back-ends.
I heard that new volume is updated with newer stuffs like data flow analysis, garbage collection, etc. Anyway the book doesn't teach you how to build a basic working compiler, so need to consult another materials.
Try Andrew Appel's "Modern Compiler implementation in Java/C/ML" or Writing a C Compiler (https://norasandler.com/book) which is much more recent.
Eventually, you'd want to hack GCC/LLVM because they are production-grade compilers.
No, not at all, the teachings and techniques have been surpassed since four decades or so.
The algorithm LALR is flawed, it only works for a subset of CFG instead of all. That alone is already a death blow. If you want to try out BNF grammars in the wild, it is nearly guaranteed that they are complex enough for LALR to shit itself with S-R conflicts.
The technique of generating and dumping source code is awkward and the reasons that made that a necessity back then are no longer relevant. A good parser is simply a function call from a code library.
The technique of tokenising, then parsing in a second pass is awkward, introduces errors and again the reasons that made that a necessity back then are no longer relevant. A good parser works "on-line" (term of art, not meaning "over a computer network" here) by tokenising and parsing at the same time/single-pass.
The book precedes Unicode by a long time and you will not learn how to properly deal with text according to the rules laid out in its various relevant reports and annexes.
The book does not take into consideration the syntactic and semantic niceties and features that regex have gained since and thus should definitely also be part of a grammar parser.
> recommend any other learning resources
Depends on what your goals are. For a broad and shallow theoretical introduction and to see what's out there, browse the slide decks of university lectures for this topic on the Web.
Are you sure it’s an extinct art though? LLVM is flourishing, many interesting IRs come to life like MLIR, many ML-adjacent projects build their own compilers (PyTorch, Mojo, tinygrad), many big tech like Intel, AMD, Nvidia, Apple and others contribute to multiple different compilers, projects integrate one to another at different levels of abstraction (PyTorch -> Triton -> CUDA) - there is a lot of compilation going on from one language to another
Not to mention many languages in a mainstream that weren’t that popular 10 years ago - think Rust, Zig, Go
Do you distinguish between writing a compiler and writing an optimizing compiler, and if so, how is writing an optimizing compiler an extinct art?
Equality saturation, domination graphs, chordal register allocation, hardware-software codesign, etc there are many new avenues of research for compilers, and these are just the ones on the top of my head that are relevant to my work. Most optimization work is R&D and much of it is left unimplemented at scale, and things like the phase-ordering problem and IR validation are hard to do in practice, even given ample resources and time.
A good counterpoint is that a lot of information about this is dense, cryptic, weird, confusing and hard to get.
The major problem is not to find the sophisticated things, but understand how do it in simple-ish ways.
Do otherwise is a major waste of time!
P.D: And yes, only when you get the basic and learn the jargon still is a problem to find the neat tricks, but is likely that you already get that there is nothing like read the source... (sadly that source is in C or worse C++, but lately with Rust that is gaining traction at least it make more sense!)
Why? Is the mechanism that complicated? I'm pretty sure medium format SLR like Hasselblad or Rollei SL66 is more... ummm complex.
reply