Jepsen: Redis-Raft 1b3fbf6

benschulz · on June 23, 2020

It's unbelievable how hard to grasp distributed systems are. I recently implemented Paxos in Rust and at certain points I literally thought I was losing my mind.

When you read Paxos Made Simple it really all seems so, well, simple. But then you get inconsistent commits and look at the traces of what happened and just go "How?!"

aphyr · on June 23, 2020

One of the things that surprised me about this analysis was just how many bugs we found that had to do with the actual Raft implementation. Usually when I test Raft-based systems the bugs are at the edges--like the coupling of the system to the Raft library, treating it like an externally-queryable log rather than the driver of a state machine, and so on. We found integration bugs here too, but also a fair number of issues in the Raft library itself--and this is despite Redis-Raft having existing integration tests!

This stuff is hard!

evntdrvn · on June 23, 2020

Has anyone approached Jepsen about running an analysis on the Erlang Ra implementation? I believe they've been running Jepsen tests internally, just curious if they're thinking about getting an official analysis at some point. Thanks for all that you folks do!! * https://github.com/rabbitmq/ra

aphyr · on June 23, 2020

No, we haven't talked yet, but I would like to someday. :)

eproxus · on June 24, 2020

Would love for Jepsen to take a stab at Erlang’s Mnesia database as well!

kungfooguru · on June 25, 2020

Not really worth it since distributed mnesia is basically CA (in terms of CAP) which shouldn't be a thing, but is in the case of mnesia :)

Diggsey · on June 23, 2020

Was the C Raft implementation in use a pre-existing library, or was it developed specifically for Redis-Raft?

aphyr · on June 23, 2020

Pre-existing! It's a fork of willamt's https://github.com/willemt/raft/, which has been around since 2013, and has property-based fuzz testing! It really does look like it's got its own extensive tests; I'm surprised we found issues.

anentropic · on June 24, 2020

this is kind of terrifying

karlding · on June 23, 2020

I'm curious if there's research into "better" primitives in Programming Languages in order to simplify writing distributed systems, analogous to how concurrency primitives beyond Mutexes, Semaphores, and Condition Variables (like Futures, Monitors, etc. or approaches such as Actors) can greatly simplify logic and enhance one's ability to reason about code. Or things like the Rust borrow checker.

The closest thing I'm aware of is TLA+.

aphyr · on June 23, 2020

There's a lot of research into this, actually! Folks have been working on ways to extract executable code from Alloy, TLA+, Isabelle/HoL, and Coq specifications. That doesn't help with implementations which don't use codegen though--and it doesn't help you with the parts of the program that aren't formalized.

jnwatson · on June 23, 2020

There are two general issues here: there's the nuts and bolts, and then there's the emergent properties of the protocol.

In my experience, async (which includes futures/promises, and actor-like mechanisms) makes the nut-and-bolts problems of avoiding variable race conditions, avoiding deadlock, managing multiple things going on, way easier.

You still need fuzzing and model checking to make sure you got the strategic stuff right.

That said, the team I work on is about to release our first Raft-based product, so I might have a different opinion in a few months.

uluyol · on June 24, 2020

That's because the core Paxos protocol is relatively simple. However, no production service could ever survive with just that. Once you start considering all the features and optimizations you will really need, things get very hairy.

SEJeff · on June 24, 2020

It doesn't use paxos. It uses raft, an entirely different distributed consensus algorithm.

carb · on June 24, 2020

FYI the parent of the comment you replied to was about their experience with Paxos specifically, not the main article.

andoriyu · on June 23, 2020

Well paxos way more complex than raft. I'm not saying building on top of raft is easy, I'm saying making a MVP raft implementation is easier than paxos.

One thing I wish raft had - a learner role which act like a follower that can't start an election until it has catch up with the rest of cluster. etcd has it, but I wish it was part of the raft instead, as well as bulk log transfer.

Article pointing out a very common issue that anyone who tried implementing raft runs into:

Letting a follower forward request to leader on client's behalf is not easy to implement correctly, that's why most popular raft based software (hashicorp stack) doesn't do that. Not worth it.

aphyr · on June 24, 2020

> Letting a follower forward request to leader on client's behalf is not easy to implement correctly, that's why most popular raft based software (hashicorp stack) doesn't do that. Not worth it.

I'm honestly surprised by this comment! I've written multiple Raft implementations, and request proxying was one of the easiest things to get right--it doesn't have to touch the Raft subsystem at all. Could you talk a little more about this?

kjnilsson · on June 25, 2020

Proxying isn't hard to implement but in the erlang Raft implementation, Ra we decided only to do it when the client explicitly declares that it does not care about ordering. When proxying it is always possible that the client will find out the current leader whilst the proxied request is in flight. This may be more of a problem inside erlang where it is easy to address any processes within the erlang cluster.

jeffbee · on June 23, 2020

I don't see anything in this blog that even touches on "distributed systems are hard". Every issue in here should be filed under "Redis has no tests". If you follow basic software engineering principles, you'll find distributed systems easier to approach.

benschulz · on June 23, 2020

My reading of the article's introduction is that Redis is adding this feature and are (among other things I'm sure) paying jepsen to test it. So this is them having tests.

> If you follow basic software engineering principles, you'll find distributed systems easier to approach.

When I implemented Paxos I had tests and when they failed they spit out an exact trace of what happened in what order and on what node. Sometimes it was still excruciating to figure out what happened. Here's[1] a comment which you can think of as a bug tombstone. It took me half a day to figure out after I had a trace to analyze the issue.

[1]: https://github.com/benschulz/paxakos/blob/ee051ff67b5da6f287...

jeffbee · on June 23, 2020

Sure, but now imagine you have no confidence that any part of your paxos implementation works at all, nevermind the paxos part. That's my impression of issue #13 from the article: not only did the software not pass the test, it's clear that nobody ever even tried to use it, at all!

Full-scale blackbox testing of a database system is similar to dogfooding. You only use it when you have high confidence that you have exhausted the possibilities of unit and integration tests. It's clear this project did not start with exhaustive unit tests.

It reminds me a bit of FoundationDB, which is also a terrible program nobody should entrust with data they ever want to see again. The first time I tried to use it it ran out of memory and crashed in about ten seconds. I found the problem, which was that their huge-page-aware allocator, which has no tests, had never actually been used by anybody on a machine with huge pages. It was a core library of a released database which had never been executed by anyone. This Redis thing is the same: nobody had ever said "RAFT SET foo bar", if they had done they would have seen the problem right away.

aphyr · on June 23, 2020

> It's clear this project did not start with exhaustive unit tests.

I can't speak to "exhaustive", but Redis-Raft did have an extant unit and integration test suite prior to our collaboration. Here's what they looked like: https://github.com/RedisLabs/redisraft/tree/ff9fb28c74db880c...

I'm hesitant to draw too strong a conclusion here, and I can't speak for the Redis Labs team, but I do suspect that this is somewhere where... having an outside tester, like Jepsen (or a suitably adversarial QA team) can help detect missing-stairs sorts of problems. Coming from the perspective of a prospective operator (and having some experience with testing distributed systems), I immediately said "of course I want proxy mode by default", when this wasn't how the Redis-Raft designers necessarily intended things to be used--they intended smart clients to make it so that users wouldn't actually need proxy mode, so they hadn't focused on testing it that way.

AtlasBarfed · on June 24, 2020

How would unit tests truly test anything for the meat of a distributed protocol?

To me that would be the ur-example of "proving it is correct, but that doesn't mean there aren't bugs in it"

benschulz · on June 23, 2020

Fair enough. I think I misinterpreted the "easier to approach" part of your original answer. Sorry if my answer came across as defensive. My wounds are still fresh. ;)

19h · on June 24, 2020

Redis does have tests, especially for the clustering feature..

https://github.com/antirez/redis/tree/unstable/tests/cluster...

zxexz · on June 23, 2020

Wow, Aphyr is on a roll! Mongo, PostgreSQL and Redis-Raft analyses all published in a window of ~5 weeks!

I really enjoy the Jepsen analyses and they've made me think a lot harder about distributed systems. Thanks!

aphyr · on June 23, 2020

Part of this is release timing--the Dgraph and Redis analyses were mainly done in Q1 of this year, and I was able to write and release the Mongo & PostgreSQL reports on my own schedule. Wish I were that productive! ;-)

tough · on June 23, 2020

I'm not even a DBAdmin or something, and I probably get 20% of these, but I enjoy reading them thoroughly

rantwasp · on June 23, 2020

20% is better than 0% and having the curiosity to learn about is what’s essential

shay_ker · on June 23, 2020

It's awesome to see this being done in the development phase. There's so much to learn, even just from the feedback cycle between aphyr and RedisLabs.

DevKoala · on June 23, 2020

The scrutiny this release is going through makes me confident that the Redis Labs team will deliver in the end.

Also, if you are looking for a linearly scalable distributed pub-sub with strong guarantees around consensus and message persistence, it might be worth looking at Apache Pulsar.

ses1984 · on June 23, 2020

Thank you for this, love this work.

LeafMeAlone · on June 23, 2020

Interesting read as always!

Small typo, I believe the link in the sentence Tangentially, we were surprised to discover that Redis Enterprise’s claim of “full ACID compliance”... was copy/pasted incorrectly

tlhunter · on June 23, 2020

Another typo is:

> In future work, we believe it be prudent to explore other types of operations: GET and PUT, perhaps, or operations on sets.

Should say GET and SET.

Rapzid · on June 24, 2020

Not nearly as engaging without the Barbie dolls.

aphyr · on June 24, 2020

So many people were very upset about the barbies. Now that they're gone, I get complaints! Folks are never satisfied. :/

no-s · on June 24, 2020

the barbies underlined that otherworldly air of surrealism you should employ when experiencing unfalsifiable claims of serialization...

ris · on June 23, 2020

So, "work in progress".

tlhunter · on June 23, 2020

20 of the identified 21 issues have been fixed.