I've spent a lot of time reading C sources. Standouts are nginx, mbed TLS, Amazon s2n. Clean coding styles, consistent in checking function return values (very important! significant source of vulnerabilities in C software), comments where due, no hacks.
Among the most convoluted source codes I've read is Tor. It works (apparently), and it isn't even very insecure per se (the code is littered with hard asserts that will abort code execution if an expected condition isn't met), but it is unnecessarily dense. Example: I use software to analyze the call graph (which function calls which function) and when I ask it to find potentially recursive loops (A() calls B() calls A() etc) it spews out tens of thousands of potential recursions.
By comparison, mbed TLS only has a couple of these, and a large project like OpenSSL 50 or so.
Conversely, C software that isn't consistent in error signaling (return -1 on error in function A, return 0 in function B, set parameter int* err in function C, etc), doesn't perform due error checking, whose call graph is spaghetti, mindlessly performs multiplication (leading to overflows with certain inputs), uses signed or unsigned int where size_t is better suited, are usually susceptible to bugs and abuse (vulnerabilities). The projects I mentioned are very clean in this regard.
I have learned a lot from reading the source code and watching it develop. It is written in modern Java 8. The authors are obviously experts of the language, JVM and ecosystem. Since it is an MPP SQL engine performance is very important. The authors have been able to strike a good balance between performance and clean abstractions. I have also learned a lot about how to evolve a product. Large features are added iteratively. In my own code I often found myself going from Feature 1.0 -> Feature 2.0. Following Presto PRs, I have seen how for large features they go from Feature 1.0 -> Feature 1.1 -> Feature 1.2 -> ... Feature 2.0 very quickly. This is much more difficult than it sounds. How can I implement 10% of a feature, still have it provide benefits and still be able to ship it? I have seen how this technique allows for code to make it into production quickly where it is validated and hardened. In some ways it reminds me of this: https://storify.com/jrauser/on-the-big-rewrite-and-bezos-as-.... You shouldn't be asking for a rewrite. Know where you want to go and carefully plan small steps from here to there.
Asterisk PBX. Well-chosen small set of module types (channel drivers, applications, functions, resources, codecs & formats), allowing to implement literally any behaviour, and converge with any thinkable external technology. Not working in VoIP anymore for quite long time, but the clarity of design of Asterisk has deeply influenced me.
Gstreamer. Pipeline is very powerful model for software, the potential of it is tremendous. Unfortunately I find level of development & maintenance of Gstreamer project itself quite poor - the code is horribly complicated for questionable reasons (it's said to be non-blocking everywhere; I find it bad excuse for being ridden with subtle bugs and for failures to use custom pipelines as blocks for higher-level pipelines).
I find such projects as ffmpeg and linux kernel quite well engineered, but have nothing special to say about them except that they are reasonably well organized and get better day by day.
For user-interface apps with considerations of high user productivity, I find such software as readline, tmux, mutt and bunch of other following wise pattern of extensible and scriptable software: if you want hotkeys, you need a domain-specific language and bindings must be
I am grateful to work with a few of the asterisk developers and they strive hard for quality. A project that long running and feature-rich is not easy to keep up to date, stable and well architected. If you want to see a project with professional commit messages, it is a solid example (the past several years at least).
Just in case anyone were to be led to believe this:
Asterisk's code base is a pile of crap.
It's been getting a bit better over the years, but it still is terrible, tons of conceptual blunder, protocol implementations are only losely inspired by the specification, system APIs are used incorrectly, lots of code doesn't bother with dynamic string lengths, but instead simply truncates strings arbitrarily if they don't fit into some fixed-size buffer, ...
The only reason it kindof works is because bugs that happen often enough do end up being fixed at some point, but that's about it. If you know your C and POSIX APIs and you don't believe me, just go and have a look at the code, I promise you'll find a bug in less than an hour.
Yes I know Asterisk is ridden with bugs and has very nasty spots at its core (e.g. "channel cloning" or whatever it is called). It was my job to debug the code with gdb and valgrind :)
What is still amazing to me is the set of core design concepts which I've listed - channels, applications... I have a case for comparison here, where the project is of comparable complexity but all features are bolted-on ad-hoc without such complexity compartmentalization which Asterisk has.
Discourse is a really solid codebase with some nice patterns (their auth/auth checking, for example); probably the best OSS Rails app I know. I routinely answer questions about how the product or API works with 30 seconds of examination of the code.
edit for details: The authors are quite meticulous (notoriously, every comment in a multi-line comment is 3 characters less than the previous) and stick to the "convention over configuration" mantra no doubt inspired by Ruby on Rails. It's interesting to see how they create abstractions to simplify so many common web dev tasks.
I'm a fan of the underused dlib C++ library[0]. It has a lot of uses and work transfers cross platform no problem. I know I can do all the work on my Linux machine then when it comes time to export for Windows just open up a VM redownload the repo and compile with cmake and it just works
The thing I like about it the most though are the examples which there are for every feature. The person who wrote it actually understands what I want out of an example, I want code I can look at and immediately understand what is going on and why. I want examples I can refer to when mine does not work so I can compare and see what it is I did wrong. Take the GUI example[1] for instance, anything that happens that is specific to that example has a comment. It makes no assumptions about your prior knowledge other then you understand C++.
Vyatta's firewall distribution had some documentation which struck me as being remarkably well-written back in the day. Usage appeared to be well thought out. Don't know if their code is nice or whatever but if other aspects are any indication, I'd imagine it too is well done.
RxSwift https://github.com/ReactiveX/RxSwift is gorgeous. Cycle.js and RxJS also. Chromium + LLVM also (minus the x-platform parts but those suck everywhere).
If someone care about GTK+-3 and C, I would recommend gnome-recipes[0]. Well written and smaller codebase (It is still in active development, so not yet feature complete).
It shall be helpful in learning Object Oriented Design using C programming and GObject.
I'm only familiar with varnish 2.1, but as to that version I think it's a bit of a stretch to say varnish is well written. VCL is very complicated - just check the request flow diagram [1]. Some of the documentation is very poor - try to find out the properties available on beresp for example (you have to grep the source code [2]), or try to understand the precise function and implications of grace mode, saint mode, or hit_for_pass. The best redeeming quality is varnishtest and some of the other tools that are provided.
Digging into the code reviews of Guava is impressive: if you've ever felt a code reviewer was being too strict, that's probably nothing compared to Guava reviews. And it shows in the quality of the library.
I implemented a data structure similar to one in guava and I thought my code was pretty good. I looked at guava source out of curiosity and immediately refactored my data structure.
I ended up refactoring it again, and the code is still not as clear as guava.
For instance, I noticed they were using an enum for functions and I was like WTF who does that? Later I decided to make my library serializable so we can save to disk. Well, turns out that's exactly why they used an enum. My solution was to make a utility class to wrap the non-serializable objects but their solution was much clearer and less code
I have used the mono sourcecode as a goto reference for poorly documented .Net framework code, because usually it was clean and quality code. Some quick comparisons with the coreclr and .net reference sources also supported my impression. (a lot of code is being merged from mono now)
Among the most convoluted source codes I've read is Tor. It works (apparently), and it isn't even very insecure per se (the code is littered with hard asserts that will abort code execution if an expected condition isn't met), but it is unnecessarily dense. Example: I use software to analyze the call graph (which function calls which function) and when I ask it to find potentially recursive loops (A() calls B() calls A() etc) it spews out tens of thousands of potential recursions.
By comparison, mbed TLS only has a couple of these, and a large project like OpenSSL 50 or so.
Conversely, C software that isn't consistent in error signaling (return -1 on error in function A, return 0 in function B, set parameter int* err in function C, etc), doesn't perform due error checking, whose call graph is spaghetti, mindlessly performs multiplication (leading to overflows with certain inputs), uses signed or unsigned int where size_t is better suited, are usually susceptible to bugs and abuse (vulnerabilities). The projects I mentioned are very clean in this regard.