I'm not sure if it even has any sort of cluster consensus algorithm? I can't imagine it not eating committed writes in a multi-node deployment.
Garage and Ceph (well, radosgw) are the only open source S3-compatible object storage which have undergone serious durability/correctness testing. Anything else will most likely eat your data.
Great, so this means that the only way to get an Android release that's up-to-date on security patches is a binary-only distro - either Google Pixel, or the GrapheneOS preview channel.
Just wonderful. Google should know better than this, shame on the other OEMs that forced this mess.
If it's any consolation, preview builds are reproducible at the point that the embargo ends. A bit better than the definition of binary that we're used to.
It's just poor risk management at this point. Making sure that a configuration change doesn't crash the production service shouldn't take more than a few seconds in a well-engineered system even if you're not doing staged rollout.
They don't appear to have a rollout procedure for some of their globally replicated application state. They had a number of major outages over the past years which all had the same root cause of "a global config change exposed a bug in our code and everything blew up".
I guess it's an organizational consequence of mitigating attacks in real time, where rollout delays can be risky as well. But if you're going to do that, it would appear that the code has to be written much more defensively than what they're doing it right now.
Yea agree.. This is the same discussion point that came up last time they had an incident.
I really don’t buy this requirement to always deploy state changes 100% globally immediately.
Why can’t they just roll out to 1%, scaling to 100% over 5 minutes (configurable), with automated health checks and pauses? That will go along way towards reducing the impact of these regressions.
Then if they really think something is so critical that it goes everywhere immediately, then sure set the rollout to start at 100%.
Point is, design the rollout system to give you that flexibility. Routine/non-critical state changes should go through slower ramping rollouts.
For hypothetical conflicting changes (read worst case: unupgraded nodes/services can't interop with upgraded nodes/services), what's best practice for a partial rollout?
Blue/green and temporarily ossify capacity? Regional?
That's ok but doesn't solve issues you notice only on actual prod traffic. While it can be a nice addition to catch issues earlier with minimal user impact, best practice on large scale systems still requires a staged/progressive prod rollout.
If there is a proper rollout procedure that would've caught this, and they bypass it for routine WAF configuration changes, they might as well not have one.
Ceph is quite expensive in terms of resource usage, but it is robust and battle-tested. RustFS is very new, very much a work in progress[1], and will probably eat your data.
If you're looking for something that won't eat your data in edge cases, Ceph (and perhaps Garage) are your only options.
When you want just the API for compatibility, I guess?
Self-hosted S3 clones with actual durability guarantees exist, but the only properly engineered open source choices are Ceph + radosgw (single-region, though) or Garage (global replication based on last-writer-wins CRDS conflict resolution).
Shipping updates almost weekly is the opposite of what I want for a complex, mission-critical distributed system. Building a production-ready S3 replacement requires careful, deliberate and rigorous engineering work (which is what Garage is doing[1]).
It's not clear if RustFS is even implementing a proper distributed consensus mechanism. Erasure Coding with quorum replication alone is not enough for partition tolerance. I can't find anything in their docs.
Since storage is a critical component, I closely watched it and engaged with the project for about 2 years circa as i contemplated adding it to our project, but the project is still immature from a reliability perspective in my opinion.
No test suite, plenty of regressions, and data loss bugs on core code paths that should have been battled tested after so many years. There are many moving parts, which is both its strength and its weakness as anything can break - and does break. Even Erasure Coding/Decoding has had problems, but a guy from Proton has contributed a lot of fixes in this area lately.
One of the big positive in my opinion, is the maintainer. He is an extremely friendly and responsive gentleman. Seaweedfs is also the most lightweight storage system you can find, and it is extremely easy to set up, and can run on servers with very little hardware resources.
Many people are happy with it, but you'd better be ready to understand their file format to fix corruption issues by hand. As far as i am concerned, i realized that after watching all these bugs, the idea of using seaweedfs was causing me more anxiety than peace of mind. Since we didn't need to store billions of files yet, not even millions, we went with creating a file storage API in ASP.NET Core in 1 or 2 hours, hosted on a VPS, that we can replicate using rsync without problem. Since i made this decision, i have peace of mind and no longer think about my storage system. Simplicity is often better, and OSes have long been optimized to cache and serve files natively.
If you are not interested in contributing fixes and digging into the file format when a problem occurs, and if your data is important to you, unless you operate at the billions of files scalability tier Seaweedfs shines at, i'd suggest rolling your own boring storage system.
We're in the process of moving to it, and it does seem to have a lot of small bugfixes flying around, but the maintainer is EXTREMELY responsive. I think we'll just end up doing a bit of testing before upgrading to newer versions.
For our use case (3 nodes, 61TB of NVMe) it seems like the best option out of what I looked at (GarageFS, JuiceFS, Ceph). If we had 5+ nodes I'd probably have gone with Ceph though.
People underestimate the amount of fakeness a lot of these "open-core/source" orgs have. I guarantee from day one of starting the MinIO project, they had eyes on future commercialization, and of course made contributors sign away their rights knowing full well they are going to go closed source.
I'm not sure if it even has any sort of cluster consensus algorithm? I can't imagine it not eating committed writes in a multi-node deployment.
Garage and Ceph (well, radosgw) are the only open source S3-compatible object storage which have undergone serious durability/correctness testing. Anything else will most likely eat your data.
reply