> When asked, RDX Works executives informed Jepsen that blockchain/DLT readers would normally understand present-tense English statements like these to be referring to potential future behavior, rather than the present.
> Jepsen is no stranger to ambitious claims, and aims to understand, analyze, and report on systems as they presently behave—in the context of their documentation, marketing, and community understanding. Jepsen encourages vendors and community members alike to take care with this sort of linguistic ambiguity.
I like to think most people are able to read marketing material in the context of the roadmap to understand the difference in performance claimed vs. measured - apples and oranges comes to mind
No, most people would like to be richer for less effort and therefore will invest early in a technology and expose themselves to larger upsides were it to take of, and repeat marketing material without incurring responsibility if it's false. I will not hold you responsible for what you just said, for instance, because I assume you're just someone who read the official communication of the company you're invested in, even if it's objectively false to say that people are expected to understand ambiguity in a way closer to the spirit rather than in a way closer to the letter.
Making marketing material truer will lower the amount of people trapped into the vicious cycle of repeating false claims for the sake of propping up a risky investment, and is good for everyone involved long term.
But short-term, making outrageously optimistic statements will create a large amount of proponents which is what probably financed this report in the first place.
The difficulty is in ramping up truth without destroying too much the franchisees who are the ones who put up the actual money but valued their investment in light of a false/exaggerated statement.
It's funny how the comments here are polarized, some of them claiming that Jepsen slaughtered RDX, others that it proved that the consensus layer is rock solid.
Let's appreciate this for what it is. Blockchains are, at their heart, a type of database (or at the very least a ledger which can be the foundation on which some subset of database semantics can be layered). Performance and reliability are empirical claims which can be tested empirically, using the kind of methodology that Jepsen has been innovating for many years. It is very much to the credit of RDX Works that they subjected their product to this type of testing. I'm not saying anything about the way the use the test results in their blog post and marketing materials, though.
What I'd like to see going forward is that it's routine for blockchain-based databases to be tested the same way as real databases, based on actual shipping product rather than speculative goals. Whether you think this would be validating or devastating reveals quite a bit about your preconceptions, but either way would be a win for truth and progress.
I would love for a highly reputable team or individual to test blockchains and their claims. However, the testing should be done in a way that respects the fact that blockchains are a different type of database and should be judged as such.
For example, because of the blockchain trilemma, increasing transaction speed isn't always a good thing since it could sacrifice decentralization.
Right but there's a noticeable difference between claiming millions of transactions per second on your blockchain home page and having around 5 in an audit. Sure transaction speed isn't always a good thing, and 5 tps might be very good, but then it should be marketed as such and with a slower speed than competitors, rather than say it broke a world record, maybe ?
No offense but you and the people downvoting me don't seem to understand blockchains very well. If TPS is all that matters then there would be no need for a blockchain, just use a traditional database.
As someone who's been in the blockchain space for 8 years I've seen numerous scamcoins become popular because they lure people in with claims of high TPS. High TPS isn't actually that hard of a problem. High TPS with decentralization and security IS hard.
Case-in-point, blockchains like Bitcoin and Ethereum limit TPS on purpose specifically because they care about decentralization. After all, that's the whole point of a blockchain.
I would encourage you and the people downvoting to read up on a few things.
1) The blockchain trilemma.
2) Why Bitcoin and Ethereum limit transaction speed and block size.
3) Layer 2 Rollups (zkRollups) that offer an actual solution to transaction speeds without sacrificing decentralization.
In summary, my point is that judging a blockchain by the standards of a traditional database is like judging a traditional database by the standards of a filesystem. Sure they both store data and yes a filesystem might store data faster but there are other constraints that make them fundamentally different (like ACID transactions).
I agree TPS doesn't matter, what I'm saying is that honesty does ! Who cares, they could have claimed a record of 5 TPS and that would be it, why lie about millions, speak of a world record, and then launch an army of over-invested mobs fawning over it everywhere ?
We understand crypto very well: the goalpost is always changing for a solution looking for a problem. Yday you were probably telling everyone your investment was the best because it could handle a million per second, today you tell us we're idiot: TPS don't matter. So what matters, what is this thing good for ? It's no better than Ethereum and at least Ethereum can launch a tree of related scams so it has half a utility.
I admire aphyr's ability to dive into a new and complex distributed systems technology and understand it enough to evaluate it for correctness. I hope the Radix developers have ears to listen, the comment in the report "RDX Works informed Jepsen that the blockchain/DLT community had developed idiosyncratic definitions of safety and liveness" is not encouraging.
> To Jepsen’s surprise, RDX Works asserted that phenomena such as aborted read, intermediate read, and lost writes do not constitute safety violations (in the blockchain sense). RDX Works claims that to describe these errors as safety violations would not be understood by readers from a blockchain background; this report is therefore “factually incorrect”. On these grounds, RDX Works requested that Jepsen delete any mention of our findings from the abstract of this report.
In private DBs, reads from the DB node are considered transactions and need to follow the same rules as writes.
But on public blockchains(ledgers) only state manipulation is what matters.
For example, Metamask obtaining an address balance would be a transaction, but no one calls it that way because it doesn't modify the state.
Sure. But they could have asked for the additional clarifications or context to be added to make this clear, instead of requesting a bunch of stuff be removed because they’re concerned it’ll paint them in a bad light.
As I understand from the report, no request was made to remove the content but to leave terms like "liveness break" and "safety break" out of the abstract until those terms were defined in the main report.
I don't think readers with any kind of computational background expect to read FAILED writes. Apparently, neither does RDX Works - claiming to have fixed most of the issues.
Pleased they ignored that request too, although I can see where RDX are coming from. In a distributed ledger it's all about state. The consensus layer of the architecture is rock solid according to this report.
Yeah, the issue is serious but at the same time:
"This problem occurred only in cases where every node was killed at roughly the same time".
And there are 100 nodes on the network.
(and it is fixed now)
Well, Radix says it is fixed now. That's the gist of their response to this report, "we fixed a lot of things and no one's tested the new code but trust us!"
Isn't this is a rather common configuration for a distributed DB? Given one node dies before flush, you trust that the remaining nodes in the system will not die at the same time and live long enough to flush to their respective disks. It's a gamble yes, but depending on your environment the risk can be smaller than the benefits. For a blockchain ledger, you might want to choose the safer corner of CAP in that equation.
> Isn't this is a rather common configuration for a distributed DB?
In my experience it is not. Rather, aggressive fsyncing/O_DIRECT usage are common. The rationale for this is usually partition risk: better to durably log a write before propagating it than to potentially fail in propagating it and then be left in a position of having to either reactively fsync or hope that automatic flush-to-disk will persist your unexpectedly-sole possession of that update.
> The consensus layer of the architecture is rock solid according to this report.
That's quite a stretch. The report states explicitly a) the usual proviso that they can only prove the presence of bugs, not their absence, but more pertinently b) that their methodology is more usually applied to lower-latency databases, with the implication that they are less confident of their conclusions in this new regime:
> Radix’s low throughput and high latency may have masked safety violations. In particular, our tests required several hours to reproduce e.g. aborted read (#13).
Note also that they didn't even attempt to test what happens in the presence of malicious nodes!
You're also financially involved, akin to a franchisee or a "member" in an MLM, and make money on providing staking services on this blockchain. It's fine, but it's important to put your comments in context: you will always tend to defend your overweight investment in this pyramid, while I never heard of it til today.
This report will not help your business, but you should pressure RDX Works into doing a new one on the next version rather than convince us we didn't read properly some quite shocking things in the first one :s
Classic blockchain startup in the 2020s: put a cool name on your version of sharding, act as if it were already live and being used by governments / corporations, and hope enough people put rockets in the discord to keep the VC money coming in.
Slightly out of context, the 1.4mm tps test was during an earlier iteration of the sharded architecture (coming in 2023) whereas Jepsen were testing the unsharded mainnet.
It's a bold claim to say that your system can handle six orders of magnitude more load than it is actually capable of because next year you might release some software that does better.
The Financial Times recently changed their abbreviation for million from “m” to “mn” to help screen readers. They would typically read an m suffix as metres; I guess the FT chose mn rather than mm so that screen readers would not say millimetres instead of million.
I’ve seen that use for sure. Perhaps a little telling we’re talking about blockchain tech and that’s what people are using. Is it really about a fast distributed transaction log, or… is it about the money first.
But they wrote it in the present tense, said it was a world record even ! And how can you say an earlier iteration will come in the future, that seems like a crypto-truth bending like we see a lot :s
Admit it, they exaggerated and removed the statement to instead put it in a timeline now saying: Unlimited scalability and composability to carry DeFi into the global mainstream future with sharded, linearly scalable Cerberus consensus.
1.4m TPS on a DLT : Radix' last consensus algorithm 'Tempo' publicly achieved 1.4m TPS in 2018, the current world record. The new algorithm 'Cerberus' is theoretically infinitely scalable
For me that means they pretended to have this throughput in 2018, before this "Cerberus" miracle that is infinite. So one could repeat on twitter at every opportunity that Radix is 1.4m TPS, world record, and "later" be "infinite" with Cerberus.
Aphyr and jepsen are a blessing both to breakdown software like these as to illustrate complex distribute system concepts with real world evidence. The dist sys class provided by Aphyr is amazing (https://github.com/aphyr/distsys-class)
Aphyr is the best. No matter your fave distributed tech and all it's CAP-don't-matter stuff, he shows that... CAP does very much matter and it's really hard.
Cassandra? Kafka? MongoDB (bwahahahah)? It's all got edge cases.
He should be getting paid a million bucks a year by various auditing/accounting firms and the FTC/SEC for validating crypto claims. It would be a massive public service.
I like that its being treated like a database, and that the safety/correctness of any blockchain has to be viewed from a distributed database standpoint.
That's the best part: they paid him to throw such savage shade on their blockchain. One transaction per second, one million transactions per second, what's the difference? It's just the roadmap, man.
Makes you wonder why they ended the collaboration in November and didn't continue to retain him to validate that the new stuff lives up to its claims, doesn't it?
It's not a secret at all.
The response from the Radix CEO: "We re-used the part of his testing harness, and then re-created the other critical tests to determine if the errors detected where still present. He is fantastically expensive and booked 3-6 months in advance on average. We'll definite re-deploy this kind of testing again, but we'll save it for another bigger release, rather than a patch where the identified errors can be tested against."
Yes, generally you pay third parties for their consulting services which is exactly what RDX Works did here. Are you familiar with the history of Jepsen tests? Because this report is a pretty standard analysis of his. I think if you read some of the other distributed systems analyses he has done this would be quite clear. Your take away that this was some sort of a "shade throwing" event is pretty bizarre.
I'm familiar with Jepsen. Most consultants may not be willing to lie, but they'll take care not to mention what you pay them not to mention. If this work was done by anyone else they'd probably say something like "performance was not evaluated because the contract only covers correctness", but no, Aphyr has to dissect the false marketing in detail. And Radix had the balls to hire him presumably knowing that this is what he does.
I have a pretty much opposite perspective on Jepsen than most of the folks here. My feeling is that essentially no distributed system is perfect or without trade offs (and certainly no single node system is perfect and wothout tradeoffs) and Jepsen posts basically make that clear over and over again like some form of techie outrage porn... but the tone and implication is as if somehow there is some alternative panacea without tradeoffs... and I find that a little bit misleading.
Basically an astute reader of Jepsen testing may deduce "I need to use a single node system" which is one option but without the availability characteristics modern users usually want
If you want to look at a Jepsen test which shows a system mostly working as designed, the etcd 3.4.3 one (https://jepsen.io/analyses/etcd-3.4.3) is a much nicer read. Quoting from the discussion section, "etcd 3.4.3 lived up to its claims for key-value operations: we observed nothing but strict-serializable consistency for reads, writes, and even multi-key transactions, during process pauses, crashes, clock skew, network partitions, and membership changes"
That's what a successful test looks like. Sure, they found correctness issues with the lock api, but they were able to lend some confidence to etcd's core api. There might be bugs, but jepsen didn't observe any.
Sure, that's not proof that systems don't have tradeoffs, but it's proof that jepsen doesn't literally always find consistency issues or other core problems. Said another way, Jepsen is testing for correctness issues in systems. They either find some or they don't. These results are interesting, even if they are not a panacea.
There are other tradeoffs, but for most systems, correctness is important enough it merits consideration on its own right.
I don't think readers of jepsen misunderstand what's being tested or what it means, nor do they misunderstand that there are other tradeoffs to consider, so I think your comment is off the mark.
By analogy, if we were reading a post about "I load-tested this bridge which claims to support 50 tons of weight, and it broke at 5 tons", no one would be saying "yeah, but there are tradeoffs for bridges. This post just makes it clear that there's tradeoffs. If you make the bridge stronger, it would be more expensive, and to imply there's not tradeoffs is misleading. An astute reader might deduce that they should never drive on bridges again".
I don't think that would be a reasonable interpretation of such a post, nor do I think the interpretation you espouse here portrays an accurate sentiment.
That is not what Jepsen demonstrates. Instead it shows systems making claims that are not supported by their implementation and/or algorithms. If you want to take on your claim/suggestion then those systems just have to stop making the claims he is testing for.
PS: There is also another trend, that is System claiming they fixed the issues Jepsen finds, without submitting themselves again for analysis...but I digress now...
I respectfully disagree. Human language and characteristics around distributed data stores in the real world have inherent ambiguity implicit in them that Jepsen in my view pretends isn't the case.
Here's an analogy that may help me communicate how I feel since I realize my message is not landing: Let's say I'm buying a condo in the San Francisco Bay area. And let's say the building that I'm looking to buy in advertisers that it is historic but seismically retrofitted. Then say Jepsen-earthquake-test comes in and shakes the ground beneath the building and shows that the building indeed collapses with enough of an earthquake: would that or would that not be enough information for me to decide whether or not the seismically retrofitted building is good enough for my needs? There's a lot of ambiguity in answer that question.
> I respectfully disagree. Human language and characteristics around distributed data stores in the real world have inherent ambiguity implicit in them that Jepsen in my view pretends isn't the case.
We've been doing distributed systems for over 50 years now. There's nothing ambiguous in either the language Jepsen uses or the claims he is examining. The crypto shills chose to pretend the language is ambiguous and invent their own definitions on purpose.
Including the ridiculous "nah, everyone understands that when we speak in present tense it means we mean some unspecified point in a nebulous future".
Although I agree with your point, on language and its ambiguity, I would argue that is a different claim of the one you made above, and that I replied to.
When he demonstrated that Riak was dropping 30-70% of writes, even with the strongest consistency settings, or that Mongo had multiple scenarios of data loss, we are not talking about the subtleties of the English idiom
But the contrived nature of Jepsen tests is so different than the real world. In the real world no system behaves exactly the same as it was designed to behave; the real world has cosmic rays, earthquakes and everything in between. So no statement about a software system can ever be simply accepted as a fundamental fact and so to prove that software systems break is misleading for what most users need to understand about said systems.
A Jepsen style test optimized not for bending to show where things break but instead for showing likely real world style situations with an eval of which are most likely to arise would be far more valuable for people
You make some reasonable points, but I would say a couple things here:
- If we're sticking with your example further up-thread, you'd buy a house that was advertised as "can withstand a 4.0 magnitude earthquake", that then failed when subjected to a 4.0 magnitude earthquake.
- 4.0 magnitude earthquakes happen all of the time [0].
More or less what I'm saying is, engineers generally don't think DBs lose data, and when they start coming up with ways that might happen (node failure, network failure, clock desync), distributed DBs assure them with algorithms and configuration knobs. Aphyr puts those assurances to the test, which is so, so valuable to us all.
It's also worth saying that this space is pretty technically complicated. All the DB engineers I know use some form of Jepsen-style testing (or Jepsen itself) because it's amazingly great.
This is such a clown take. Read any root cause analysis of your favorite cloud operator and you will find all of the scenarios that Jepsen tests for in those.
And you know what's even better? They can be solved!
Jepsen is all about verifying claims and communicating limitations. Companies hire them for this as due diligence. They fix real world bugs based on the results.
> what most users need to understand about said systems
The target audience of these reports are the system builders and the software engineers building services on top of them; not end-users consuming higher level services.
We do know that some databases fare much better than others and that’s useful to many. In your analogy it would be many builders claiming their buildings to be earthquake proof with only some actually being it. Thanks to Jepsen, customers know, where to buy.
Note, this analogy is slightly off for using the phrase "some actually being it". jepsen cannot prove "earthquake proof" (correctness).
Rather, it would be "Builders claim their buildings are earthquake proof, and jepson was able to show a subset collapse in earthquakes. The rest may or may not be earthquake proof".
That's still very valuable. It's really valuable to know when something is wrong. It would be more valuable to know that something is definitely right (correct / "earthquake proof"), but jepsen cannot prove that.
If you find out, let me know! Folks are always asking me "What database is best?" and I'm just shuffling my feet and going "Uhhhh, well, systems are really hard to build correctly, there's lots of tradeoffs, you probably want something with a proven replication model under the hood but so much depends on workload and operational characteristics beyond just safety... If you find a database you love please tell me about it because I like hearing people's stories..."
FWIW, I think you're raising really good questions in this thread. Qualitative safety is highly contextual, depending on fault model, significance of individual operations, concurrency, throughput, latency demands, operational characteristics, data volume, etc etc., and I try to touch on that in the "Toward a Culture of Safety" section in this report. Hopefully that resonated for you.
I think it's a refreshingly mature section, And my hats off to you for going there. Unfortunately I don't think the typical reader who I really do believe loves Jepsen outrage porn is going there
I disagree that the analysis articles from Jepsen, taken as whole, are arguing for a "perfect distributed system without trade offs". In my experience these articles are fairly even-handed, indeed, many products make some changes based on the analysis and then contract for a follow-up. That the "tone and implication is as if somehow there is some alternative panacea without tradeoffs..." is not something I observed in this analysis or the previous ones that I have read.
In my opinion the Rethink DB 2.1.5 report went fairly well for that project. If the claims are aligned with the reality of the product, it's clear the Jepsen report will highlight that.
> When asked, RDX Works executives informed Jepsen that blockchain/DLT readers would normally understand present-tense English statements like these to be referring to potential future behavior, rather than the present.
> Jepsen is no stranger to ambitious claims, and aims to understand, analyze, and report on systems as they presently behave—in the context of their documentation, marketing, and community understanding. Jepsen encourages vendors and community members alike to take care with this sort of linguistic ambiguity.