Ask HN: What are some well written/engineered open source software?

guidovranken · on Feb 12, 2017

I've spent a lot of time reading C sources. Standouts are nginx, mbed TLS, Amazon s2n. Clean coding styles, consistent in checking function return values (very important! significant source of vulnerabilities in C software), comments where due, no hacks.

Among the most convoluted source codes I've read is Tor. It works (apparently), and it isn't even very insecure per se (the code is littered with hard asserts that will abort code execution if an expected condition isn't met), but it is unnecessarily dense. Example: I use software to analyze the call graph (which function calls which function) and when I ask it to find potentially recursive loops (A() calls B() calls A() etc) it spews out tens of thousands of potential recursions.

By comparison, mbed TLS only has a couple of these, and a large project like OpenSSL 50 or so.

Conversely, C software that isn't consistent in error signaling (return -1 on error in function A, return 0 in function B, set parameter int* err in function C, etc), doesn't perform due error checking, whose call graph is spaghetti, mindlessly performs multiplication (leading to overflows with certain inputs), uses signed or unsigned int where size_t is better suited, are usually susceptible to bugs and abuse (vulnerabilities). The projects I mentioned are very clean in this regard.

nemild · on Feb 12, 2017

Noted below, but I'd also highly recommend Redis for clean code.

samlewis · on Feb 12, 2017

What software do you use to analyze call graphs?

undergrowth54 · on Feb 12, 2017

> it is unnecessarily dense

Underhanded perhaps?

nfa_backward · on Feb 12, 2017

Facebook Presto, a MPP SQL Engine written in Java.

https://github.com/prestodb/presto

I have learned a lot from reading the source code and watching it develop. It is written in modern Java 8. The authors are obviously experts of the language, JVM and ecosystem. Since it is an MPP SQL engine performance is very important. The authors have been able to strike a good balance between performance and clean abstractions. I have also learned a lot about how to evolve a product. Large features are added iteratively. In my own code I often found myself going from Feature 1.0 -> Feature 2.0. Following Presto PRs, I have seen how for large features they go from Feature 1.0 -> Feature 1.1 -> Feature 1.2 -> ... Feature 2.0 very quickly. This is much more difficult than it sounds. How can I implement 10% of a feature, still have it provide benefits and still be able to ship it? I have seen how this technique allows for code to make it into production quickly where it is validated and hardened. In some ways it reminds me of this: https://storify.com/jrauser/on-the-big-rewrite-and-bezos-as-.... You shouldn't be asking for a rewrite. Know where you want to go and carefully plan small steps from here to there.

andrey_utkin · on Feb 12, 2017

Asterisk PBX. Well-chosen small set of module types (channel drivers, applications, functions, resources, codecs & formats), allowing to implement literally any behaviour, and converge with any thinkable external technology. Not working in VoIP anymore for quite long time, but the clarity of design of Asterisk has deeply influenced me.

Gstreamer. Pipeline is very powerful model for software, the potential of it is tremendous. Unfortunately I find level of development & maintenance of Gstreamer project itself quite poor - the code is horribly complicated for questionable reasons (it's said to be non-blocking everywhere; I find it bad excuse for being ridden with subtle bugs and for failures to use custom pipelines as blocks for higher-level pipelines).

I find such projects as ffmpeg and linux kernel quite well engineered, but have nothing special to say about them except that they are reasonably well organized and get better day by day.

For user-interface apps with considerations of high user productivity, I find such software as readline, tmux, mutt and bunch of other following wise pattern of extensible and scriptable software: if you want hotkeys, you need a domain-specific language and bindings must be

  key: action[, action...]

not

  action: key

ruffrey · on Feb 12, 2017

I am grateful to work with a few of the asterisk developers and they strive hard for quality. A project that long running and feature-rich is not easy to keep up to date, stable and well architected. If you want to see a project with professional commit messages, it is a solid example (the past several years at least).

https://github.com/asterisk/asterisk/commits/master

zAy0LfpBZLC8mAC · on Feb 12, 2017

Just in case anyone were to be led to believe this:

Asterisk's code base is a pile of crap.

It's been getting a bit better over the years, but it still is terrible, tons of conceptual blunder, protocol implementations are only losely inspired by the specification, system APIs are used incorrectly, lots of code doesn't bother with dynamic string lengths, but instead simply truncates strings arbitrarily if they don't fit into some fixed-size buffer, ...

The only reason it kindof works is because bugs that happen often enough do end up being fixed at some point, but that's about it. If you know your C and POSIX APIs and you don't believe me, just go and have a look at the code, I promise you'll find a bug in less than an hour.

andrey_utkin · on Feb 12, 2017

Yes I know Asterisk is ridden with bugs and has very nasty spots at its core (e.g. "channel cloning" or whatever it is called). It was my job to debug the code with gdb and valgrind :)

What is still amazing to me is the set of core design concepts which I've listed - channels, applications... I have a case for comparison here, where the project is of comparable complexity but all features are bolted-on ad-hoc without such complexity compartmentalization which Asterisk has.

sayelt · on Feb 12, 2017

This is surprising to read considering the following article:

https://freeswitch.org/how-does-freeswitch-compare-to-asteri...

FreeSWITCH is an alternative to Asterisk.

gaelow · on Feb 12, 2017

If the explanation looks like this I am not sure I want to see how the code looks like..

http://imgur.com/a/E0idH

rch · on Feb 11, 2017

You might find this resource helpful:

http://aosabook.org/en/index.html

cure · on Feb 12, 2017

Anything written by djb (https://en.wikipedia.org/wiki/Daniel_J._Bernstein): qmail, djbdns, ucspi-tcp, daemontools, etc.

tjalfi · on Feb 12, 2017

I'll second the recommendation for djb software. http://perl.plover.com/yak/qmail/ has slides from a presentation about qmail internals.

patio11 · on Feb 12, 2017

Discourse is a really solid codebase with some nice patterns (their auth/auth checking, for example); probably the best OSS Rails app I know. I routinely answer questions about how the product or API works with 30 seconds of examination of the code.

nstart · on Feb 12, 2017

Thank you! Came here to mention them. Thrilled to see you feel the same way about the readability of that code base.

mr_anich · on Feb 12, 2017

I've learned quite a bit from reading through the Laravel source - https://github.com/illuminate

edit for details: The authors are quite meticulous (notoriously, every comment in a multi-line comment is 3 characters less than the previous) and stick to the "convention over configuration" mantra no doubt inspired by Ruby on Rails. It's interesting to see how they create abstractions to simplify so many common web dev tasks.

t20n · on Feb 12, 2017

I specially like the collection class: https://github.com/illuminate/support/blob/master/Collection... It's almost as natural language.

Cieplak · on Feb 12, 2017

Erlang OTP

https://github.com/erlang/otp

norswap · on Feb 12, 2017

Many names put out there, but not much substantiation. If you are going to drop a name, could you explain why it is well written/engineered?

camus2 · on Feb 12, 2017

One could argue the question is way too vague. What is a "well written/engineered" software to begin with?

vitoc · on Feb 12, 2017

I like the engineering aspects of VS Code:

https://github.com/Microsoft/vscode

SirensOfTitan · on Feb 11, 2017

xmonad https://github.com/xmonad/xmonad

kornish · on Feb 12, 2017

Seconded. For anyone interested in looking at a real-world Haskell codebase, this is a classic.

dfan · on Feb 12, 2017

The Stockfish chess engine: https://github.com/official-stockfish/Stockfish

I learned a ridiculous amount from reading the source code to TeX (https://www.amazon.com/Computers-Typesetting-B-TeX-Program/d...) but it is written in a very 1970s style.

frunzales · on Feb 11, 2017

Take a look at PostreSQL.

scotty79 · on Feb 11, 2017

Sqlite might be good bet too. Especially with engineering. I have their famous test suite in mind.

bungle · on Feb 12, 2017

I think Lua deserves to be added here: https://www.lua.org/source/5.3/

terrble · on Feb 12, 2017

Chromium

edit: https://www.chromium.org/developers/design-documents

c_shu · on Feb 12, 2017

Isn't Chrome quite buggy and leaky? (for any user of Chrome.) The same goes for Chromium, right?

wazanator · on Feb 14, 2017

I'm a fan of the underused dlib C++ library[0]. It has a lot of uses and work transfers cross platform no problem. I know I can do all the work on my Linux machine then when it comes time to export for Windows just open up a VM redownload the repo and compile with cmake and it just works

The thing I like about it the most though are the examples which there are for every feature. The person who wrote it actually understands what I want out of an example, I want code I can look at and immediately understand what is going on and why. I want examples I can refer to when mine does not work so I can compare and see what it is I did wrong. Take the GUI example[1] for instance, anything that happens that is specific to that example has a comment. It makes no assumptions about your prior knowledge other then you understand C++.

[0]http://dlib.net/ [1]http://dlib.net/gui_api_ex.cpp.html

rkwasny · on Feb 12, 2017

Redis, the most cleanly written and easily extensible code in C you can find.

nickpsecurity · on Feb 12, 2017

OpenBSD for correctness and avoiding bloat. One of them told me MuPDF was cleanly coded, too. Rare for PDF readers.

cdvonstinkpot · on Feb 11, 2017

Vyatta's firewall distribution had some documentation which struck me as being remarkably well-written back in the day. Usage appeared to be well thought out. Don't know if their code is nice or whatever but if other aspects are any indication, I'd imagine it too is well done.

CoolGuySteve · on Feb 12, 2017

Quake 2 and Quake 3.

So far, it's the cleanest code I've ever worked with while still being very self-contained.

shanemhansen · on Feb 11, 2017

I really like reading the go std lib and runtime source.

adamnemecek · on Feb 11, 2017

RxSwift https://github.com/ReactiveX/RxSwift is gorgeous. Cycle.js and RxJS also. Chromium + LLVM also (minus the x-platform parts but those suck everywhere).

danielvf · on Feb 11, 2017

Redis. SQLite.

Zikes · on Feb 12, 2017

Second Redis. I don't even know C yet I find it surprisingly easy to follow.

xyzzy_plugh · on Feb 12, 2017

I've always found the git source a pleasure to read.

PretzelFisch · on Feb 12, 2017

Can you really learn from just reading source code? It seems like you need an annotated guide to understand why this was done along with the how.

inapis · on Feb 12, 2017

Somewhat. I just dug through Laravel's source code and the comments helped.

Having an annotated guide for each software would be difficult but all of us have to start somewhere.

mhluongo · on Feb 12, 2017

Commit messages help.

pksadiq · on Feb 12, 2017

If someone care about GTK+-3 and C, I would recommend gnome-recipes[0]. Well written and smaller codebase (It is still in active development, so not yet feature complete).

It shall be helpful in learning Object Oriented Design using C programming and GObject.

[0] https://wiki.gnome.org/Apps/Recipes

sdfiogjijd · on Feb 11, 2017

* PostgreSQL

* Varnish Cache

* qmail

* Mercury Programming Language

greenleafjacob · on Feb 12, 2017

I'm only familiar with varnish 2.1, but as to that version I think it's a bit of a stretch to say varnish is well written. VCL is very complicated - just check the request flow diagram [1]. Some of the documentation is very poor - try to find out the properties available on beresp for example (you have to grep the source code [2]), or try to understand the precise function and implications of grace mode, saint mode, or hit_for_pass. The best redeeming quality is varnishtest and some of the other tools that are provided.

[1] http://book.varnish-software.com/3.0/_images/request.png

[2] https://github.com/varnishcache/varnish-cache/blob/2.1/lib/l...

smcleod · on Feb 13, 2017

PostgreSQL, Nginx, fio, SublimeText (3), nmap, libcurl (and curl itself), ffmpeg (parts are also in asm), rsync, XLD, and the list goes on...

Watch out for biases based on how much people like the end product vs how well it's actually implemented though.

thiht · on Feb 12, 2017

Lua's source code is very nice to read, even if you're not a C guy

chromanoid · on Feb 12, 2017

http://netty.io

http://infinispan.org

vandyswa · on Feb 12, 2017

https://github.com/vandys/vsta

Especially the kernel in src/os

ptrptr · on Feb 11, 2017

Answering the question - that would be Blender.

informatimago · on Feb 12, 2017

Postfix is a good example of a system written in C with separate components (running in different processes for security).

rmu09 · on Feb 12, 2017

http://www.jclark.com/sp/

robertcope · on Feb 12, 2017

I always thought Postfix was nice. Maybe I'm wrong as I haven't seen it mentioned.

deepnotderp · on Feb 12, 2017

TensorFlow.

juancn · on Feb 12, 2017

LLVM is a fantastic example of well written C++ code.

bobosha · on Feb 11, 2017

Apache Solr and Tomcat

guard-of-terra · on Feb 12, 2017

Solr source code is a mess and sometimes worse than that. Test coverage is pretty good tho.

c_shu · on Feb 12, 2017

Boost

gaze · on Feb 12, 2017

Newos

rokosbasilisk · on Feb 12, 2017

django

throwawaydbfif · on Feb 12, 2017

Google's Guava library, at least the parts I've seen, is incredibly well written and organized.

A assume most "standard library" type stuff is where you will find the cleanest code.

euyyn · on Feb 12, 2017

Digging into the code reviews of Guava is impressive: if you've ever felt a code reviewer was being too strict, that's probably nothing compared to Guava reviews. And it shows in the quality of the library.

throwawaydbfif · on Feb 12, 2017

I implemented a data structure similar to one in guava and I thought my code was pretty good. I looked at guava source out of curiosity and immediately refactored my data structure.

I ended up refactoring it again, and the code is still not as clear as guava.

For instance, I noticed they were using an enum for functions and I was like WTF who does that? Later I decided to make my library serializable so we can save to disk. Well, turns out that's exactly why they used an enum. My solution was to make a utility class to wrap the non-serializable objects but their solution was much clearer and less code

kodfodrasz · on Feb 13, 2017

I have used the mono sourcecode as a goto reference for poorly documented .Net framework code, because usually it was clean and quality code. Some quick comparisons with the coreclr and .net reference sources also supported my impression. (a lot of code is being merged from mono now)

apeacox · on Feb 11, 2017

A *BSD OS

bch · on Feb 12, 2017

For example, NetBSD, which also had a book[0] written about it.

[0] http://www.spinellis.gr/codereading/