It's unbelievable how hard to grasp distributed systems are. I recently implemented Paxos in Rust and at certain points I literally thought I was losing my mind.
When you read Paxos Made Simple it really all seems so, well, simple. But then you get inconsistent commits and look at the traces of what happened and just go "How?!"
One of the things that surprised me about this analysis was just how many bugs we found that had to do with the actual Raft implementation. Usually when I test Raft-based systems the bugs are at the edges--like the coupling of the system to the Raft library, treating it like an externally-queryable log rather than the driver of a state machine, and so on. We found integration bugs here too, but also a fair number of issues in the Raft library itself--and this is despite Redis-Raft having existing integration tests!
Has anyone approached Jepsen about running an analysis on the Erlang Ra implementation? I believe they've been running Jepsen tests internally, just curious if they're thinking about getting an official analysis at some point. Thanks for all that you folks do!!
* https://github.com/rabbitmq/ra
Pre-existing! It's a fork of willamt's https://github.com/willemt/raft/, which has been around since 2013, and has property-based fuzz testing! It really does look like it's got its own extensive tests; I'm surprised we found issues.
I'm curious if there's research into "better" primitives in Programming Languages in order to simplify writing distributed systems, analogous to how concurrency primitives beyond Mutexes, Semaphores, and Condition Variables (like Futures, Monitors, etc. or approaches such as Actors) can greatly simplify logic and enhance one's ability to reason about code. Or things like the Rust borrow checker.
There's a lot of research into this, actually! Folks have been working on ways to extract executable code from Alloy, TLA+, Isabelle/HoL, and Coq specifications. That doesn't help with implementations which don't use codegen though--and it doesn't help you with the parts of the program that aren't formalized.
There are two general issues here: there's the nuts and bolts, and then there's the emergent properties of the protocol.
In my experience, async (which includes futures/promises, and actor-like mechanisms) makes the nut-and-bolts problems of avoiding variable race conditions, avoiding deadlock, managing multiple things going on, way easier.
You still need fuzzing and model checking to make sure you got the strategic stuff right.
That said, the team I work on is about to release our first Raft-based product, so I might have a different opinion in a few months.
That's because the core Paxos protocol is relatively simple. However, no production service could ever survive with just that. Once you start considering all the features and optimizations you will really need, things get very hairy.
Well paxos way more complex than raft. I'm not saying building on top of raft is easy, I'm saying making a MVP raft implementation is easier than paxos.
One thing I wish raft had - a learner role which act like a follower that can't start an election until it has catch up with the rest of cluster. etcd has it, but I wish it was part of the raft instead, as well as bulk log transfer.
Article pointing out a very common issue that anyone who tried implementing raft runs into:
Letting a follower forward request to leader on client's behalf is not easy to implement correctly, that's why most popular raft based software (hashicorp stack) doesn't do that. Not worth it.
> Letting a follower forward request to leader on client's behalf is not easy to implement correctly, that's why most popular raft based software (hashicorp stack) doesn't do that. Not worth it.
I'm honestly surprised by this comment! I've written multiple Raft implementations, and request proxying was one of the easiest things to get right--it doesn't have to touch the Raft subsystem at all. Could you talk a little more about this?
Proxying isn't hard to implement but in the erlang Raft implementation, Ra we decided only to do it when the client explicitly declares that it does not care about ordering. When proxying it is always possible that the client will find out the current leader whilst the proxied request is in flight. This may be more of a problem inside erlang where it is easy to address any processes within the erlang cluster.
I don't see anything in this blog that even touches on "distributed systems are hard". Every issue in here should be filed under "Redis has no tests". If you follow basic software engineering principles, you'll find distributed systems easier to approach.
My reading of the article's introduction is that Redis is adding this feature and are (among other things I'm sure) paying jepsen to test it. So this is them having tests.
> If you follow basic software engineering principles, you'll find distributed systems easier to approach.
When I implemented Paxos I had tests and when they failed they spit out an exact trace of what happened in what order and on what node. Sometimes it was still excruciating to figure out what happened. Here's[1] a comment which you can think of as a bug tombstone. It took me half a day to figure out after I had a trace to analyze the issue.
Sure, but now imagine you have no confidence that any part of your paxos implementation works at all, nevermind the paxos part. That's my impression of issue #13 from the article: not only did the software not pass the test, it's clear that nobody ever even tried to use it, at all!
Full-scale blackbox testing of a database system is similar to dogfooding. You only use it when you have high confidence that you have exhausted the possibilities of unit and integration tests. It's clear this project did not start with exhaustive unit tests.
It reminds me a bit of FoundationDB, which is also a terrible program nobody should entrust with data they ever want to see again. The first time I tried to use it it ran out of memory and crashed in about ten seconds. I found the problem, which was that their huge-page-aware allocator, which has no tests, had never actually been used by anybody on a machine with huge pages. It was a core library of a released database which had never been executed by anyone. This Redis thing is the same: nobody had ever said "RAFT SET foo bar", if they had done they would have seen the problem right away.
I'm hesitant to draw too strong a conclusion here, and I can't speak for the Redis Labs team, but I do suspect that this is somewhere where... having an outside tester, like Jepsen (or a suitably adversarial QA team) can help detect missing-stairs sorts of problems. Coming from the perspective of a prospective operator (and having some experience with testing distributed systems), I immediately said "of course I want proxy mode by default", when this wasn't how the Redis-Raft designers necessarily intended things to be used--they intended smart clients to make it so that users wouldn't actually need proxy mode, so they hadn't focused on testing it that way.
Fair enough. I think I misinterpreted the "easier to approach" part of your original answer. Sorry if my answer came across as defensive. My wounds are still fresh. ;)
Part of this is release timing--the Dgraph and Redis analyses were mainly done in Q1 of this year, and I was able to write and release the Mongo & PostgreSQL reports on my own schedule. Wish I were that productive! ;-)
The scrutiny this release is going through makes me confident that the Redis Labs team will deliver in the end.
Also, if you are looking for a linearly scalable distributed pub-sub with strong guarantees around consensus and message persistence, it might be worth looking at Apache Pulsar.
Small typo, I believe the link in the sentence Tangentially, we were surprised to discover that Redis Enterprise’s claim of “full ACID compliance”... was copy/pasted incorrectly
When you read Paxos Made Simple it really all seems so, well, simple. But then you get inconsistent commits and look at the traces of what happened and just go "How?!"