I once had to argue that we still do need backup even though S3 has redundancy. They laughed when I mentioned a possible lock-up from AWS (even due to a mistake or whatever). I asked what if we delete data from app by mistake? They told me we need to be careful not to do that. I guess I am getting more and more tired of arrogant 25 years old programmers with 1-2 years in industry and no experience.
One thing you should absolutely not count on, but might be a course of actions for large clients, is to contact support and ask them to restore accidentally / maliciously deleted files.
I would never use this as part of the backup and restore plan; but I was lucky when a bunch of customer files were deleted due to a bug in a release. Something like 100k files were deleted from Google Storage without us having backup. In a panic we contact GCP. We were able to provide a list of all the file names from our logs. In the end, all but 6 files were recovered.
I think it took around 2-3 days to get all the files restored, which was still a big headache and impactful to people.
This is not a reliable mechanism btw. There will be times when they won't be able to restore the data for you. Their product has options to avoid this situation like object versioning.
S3 and (others) have version history that can be enabled.
If you have to take care of availablity and redundancy and delete protection and backups then why pay the premium S3 is charging ?
Either you don't trust the cloud and you can run NAS or equivalent (with s3 APIs easily today) much cheaper or trust them to keep your data safe and available.
No point in investing in S3 and then doing it again yourself.
> No point in investing in S3 and then doing it again yourself.
I mean that's just obviously wrong, though.
There is a point.
> Either you don't trust the cloud and you can run NAS or equivalent (with s3 APIs easily today) much cheaper or trust them to keep your data safe and available.
What if you trust the cloud 90%, and you trust yourself 90%, and you think it's likely that the failure cases between the two are likely to be independent? Then it seems like the smart decision would be to do both.
Your position is basically arguing that redundant systems are never necessary, because "either you trust A or you trust B, why do both?" If it's absolutely critical that you don't suffer a particular failure, then having redundant systems is very wise.
My point is if your redundancy is better than AWS then why pay for them ? If it not they why invest in your own?.
You can argue that you protect against different threats than AWS does . So far I have not seen a meaningful argument of threats a on Prem protects differently than the cloud that you need both.
Say for example your solution is to put all your data backups on the moon then it makes sense to do both, AWS does not protect against threat to planet wide issues.
However if you are both protecting against exact same risks having just provider redundancy only protects against events like AWS goes down for days /months or goes bankrupt.
All business decisions have some risk , provider redundancy does not seem a risk to mitigate for the cost it would mean for most businesses I have seen.
Even Amazon.com or Google apps host on their own cloud and not use multi cloud after all, their regular businesses are much bigger than their cloud biz , they would still risk those to stick to their cloud/services only.
> My point is if your redundancy is better than AWS then why pay for them ? If it not they why invest in your own?
This is a really confusing question. Redundancy requires more than 1 option. It's not about it being better than AWS, it's that in order to have it you need something besides just AWS. AWS may provide redundant drives, but they don't provide a redundant AWS. AWS can protect against many things, but it cannot protect against AWS being unavailable.
> Even Amazon.com or Google apps host on their own cloud and not use multi cloud after all, their regular businesses are much bigger than their cloud biz
This is probably true with Google, but AWS contributes > 50% of Amazon's operating income. [1]
Interesting, no wonder AWS head became Amazon CEO.
Their retail/e-commerce side is less profitable than AWS but the absolute revenue is still massive and the risk of losing that a chunk of that revenue(and income) due to tech issues is still enormous risk for Amazon .
You and AWS are using similar chips similar hard disks even with similar failure rates.
If you both use same hardware from say batch both can defects and fail at similar times.or you use the same file systems, that say corrupts both your backups.
90% is not a magic number , you need to know AWS supply chains and practices thoroughly and keep yours different enough not to have same risks as AWS does for your system to have independent probability of failures.
True. One would want to continually decorrelate services or model the dependencies. Redundancy will help even with some dependency, but you raise an important point.
But you still have some risks here, yes, with a super low probability, but a company-killing impact.
In some industries - banking, finance, anything regulated, or really (I'd argue) anywhere where losing all of your data is company killing - you will need a disaster recovery strategy in place.
The risks requiring non-AWS backups are things like:
- A failed payment goes unnoticed and AWS locks us out of your AWS account, which also goes unnoticed and the account and data are deleted
- A bad actor gains access to the root account through faxing Amazon a fake notarized letter, finding a leaked AWS key, social engineering one of your DevOps team, and encrypts all of your data while removing your AWS-based backups
- An internal bad actor deletes all of your AWS data because they know they're about to be fired
...and so on.
There's so many scenarios that aren't technical which can result in a single vendor dependency for your entire business being unwise.
A storage array in a separate DC somewhere where your platform can send (and only send! not access or modify) backups of your business critical data ticks off those super low probability but company-killing impact risks.
This is why risk matrices have separate probability and impact sections. Miniscule probability but "the company directors go to jail" impact? Better believe I'm spending some time on that.
Just to add that S3 supports a compliance object lock that can't even be overridden by the root user. Also AWS doesn't delete your account or data until 90 days after the account is closed.
Between these two protections, it's pretty hard to lose data from S3 if you really want to keep it. I would guess they are better protections than you could achieve in your own self managed DC.
I'm guessing AWS has some clause in their contract that means they can refuse to deal with you or even return any of your data if they feel like it. Not sure if that's ever happened, but still worth considering it.
Yes threat models is obvious qualifier, if you have a business that requires backup on the moon if there asteroid collision then by all means got for it.[1]
For most companies what AWS.or Azure offers is more than adequate.
An internal bad actor with that level of privileged access can delete your local backups or external one can all things you he can do to AWS he can likely do easier to your company storage DC too.
Bottom-line it doesn't matter if customers can pay for all this low probability stuff that can only happen on the cloud and not on Prem sure go ahead. Half the things customers pay for they don't need or use anyway.
[1] assuming your business model allows you to spend the expense outlay you need for the threat model
Nope. 3-2-1 strategy. 3 Backups, 2 Medias, 1 Offsite. Now try to delete files from the media in my safe. Only I have a key.
Sure, your threat model may vary. But relying on cloud only for your backup is simply not enough. If you split access for your AWS backup and your DC backup to two different people, you mitigated your thread model. If you only have 1 backup location, that's going to be very hard.
All of these are questions asked and solved 10 years ago by bean counters who only job is risk mitigation.
Every cloud provider has compliance locks which even root user cannot disable, version history and you can setup your own copy workflow storage container to second container without delete/update access to second one to two different people or whatever.
Not sure I agree about the usefulness of different media.
Having had to restore databases from tapes and removable drives for a compliance/legal incident, we had a failure rate of >50% on the tapes and about 33% for the removable drives.
I came away not trusting any backup that wasn’t on line.
At $50/month scale a lot of things are possible. Most companies cannot store their data in a hard disk in a safe. If you can, then cloud is a convenience not a necessity for you. I.e. you are perfectly fine running your storage stack for the most part.
My company is not very big(100ish employees) and we pay $200k+ for AWS in just storage and AWS is not even out primary cloud. If we have to do what you have, it is probably in bandwidth costs alone another $500k. Add running costs in another cloud and recurring bandwidth for transfers , retrieval from Glacier for older data on top of that.[1]
Over 3 years that would be easily $1-$1.5 million in net new expenses for us scale.
No sane business is going to sign off on +3x storage costs on a risk that cannot be easily modeled[2] and costs that cannot be priced into the product, just so one sysadmin can sleep better at night.
[1]your hard disk in a safe third component is not sensible discussion point at reasonable scale.
[2] this would be probability of data loss with AWS * business cost of losing that data > cost of secondary system.
Or probability of data availablity event(like now) * business cost of that > cost of an active secondary system .
For almost no business in the world the either equation would be valid.
For example even the cost is 100B dollars in revenue with 6 nines of durability the expected loss would be only $10,000 (100B * 0.000001) a secondary system is much costlier than that.
There are completely independent risks that you are dealing with here. If you are a small company there is a non-insignificant risk that your cloud account will be closed and it will be impossible to find out why or to fix it in a timely matter. There have been several that were only fixed after being escalated to the front page of Hacker News, and we haven't heard about the ones that didn't get enough upvotes to get our attention and were never fixed.
Also, what we saw on Dec 7th was that the complexity of Amazon's infrastructure introduces risks of downtime that simply cannot be fully mitigated by Amazon, or by any other single provider. More redundancy introduces more complexity at both the micro level and macro level.
It doesn't really cost that much to at least store replicated data in an independent cloud, particularly a low-cost one like Digital Ocean.
Backup on site and store tertiary copies in a cloud. Storing all backups in AWS wouldn't meet a lot of compliance requirements. Even multiple AZs in AWS would not pass muster as there are single points of failure (API, auth, etc).
Whether you realize it or not, you believe in the Scapegoat Effect, and it's going to get you into a shitload of trouble some day.
Customers don't care if it's you're fault or not, they only care that your stuff is broken. That safety blanket of having a vendor to blame for the problem might feel like it'll protect your job but the fact is that there are many points in your career where there is one customer we can't afford to lose for financial or political reasons, and if your lack of pessimistic thinking loses us that customer, then you're boned. You might not be fired, but you'll be at the top of the list for a layoff round (and if the loss was financial, that'll happen).
In IT, we pay someone else to clean our offices and restock supplies because it's not part of our core business. It's fine to let that go. If I work at a hotel or a restaurant, though, 'we' have our own people that clean the buildings and equipment. Because a hotel is a clean, dry building that people rent in increments of 24 hours. Similarly, a restaurant has to build up a core competency in cleanliness or the health department will shut them down. If we violate that social contract, we take it in the teeth, and then people legislate away our opportunities to cut those corners.
For the life of me I can't figure out why IT companies are running to AWS. This is the exact same sort of facilities management problem that physical businesses deal with internally.
I have saved myself and my teams from a few architectural blunders by asking the head of IT or Operations what they think of my solution. Sometimes the answer starts with, "nobody would ever deploy a solution that looked like that". Better to get that feedback in private rather than in a post-mortem or via veto in a launch meeting. But I have had less and less access to that sort of domain knowledge over the last decade, between Cloud Services and centralized, faceless IT at some bigger companies. It's a huge loss of wisdom, and I don't know that the consequences are entirely outweighed by the advantages.
Having an additional AWS account which some S3 backs up to, with write only permissions (no delete) and in an account that is not used by anyone, seems like a good idea for this type of situation/concern.
I had this experience when I asked about s3 backup also (after a junior programmer deleted a directory in our s3 bucket...). The response from r/aws was "just don't let that happen" or ("use IAM roles")
Maybe they are getting tired of arrogant older programmers assuming they cannot possibly be wrong. God forbid a 25 year old might actually have a good idea (and I am far removed from my 20s).
Maybe having S3 redundancy wasn't the most important thing to be tackled? Does your company really need that complexity? Are you so big and such an important service that you cannot possibly risk going down or losing data?
I'm not sure how you got "backups are for old people" from my post. My point is that there are two sides to this. Perhaps the data being stored on S3 data _was_ backup data and this engineer was proposing replicating the backup data to GCP. That's probably not the highest priority for most companies. Maybe the OP was right and the other engineers were wrong. Who knows.
In my experience, the kind of person that argues about "arrogant 25 year olds that know everything" is the kind of person that only sees their side of a discussion and refuses to understand the whole context. Maybe OP was in the right, maybe they weren't. But the fact that they are focusing on age and making ad hominem attacks is a red flag in my book.
I’ve most definitely been in numerous places where arrogant 25 year olds with CS degrees but not smart enough to make it to FAAnG think they know what they are talking about when they don’t. Not every 25yo is an idiot, but many especially in tech think they are smarter than they are because they’re paid these obscene amounts of money.
But that's just it; you can't even have that discussion if the response to "hey, should we be backing up beyond S3 redundancy?" is "No. Why would we? S3 is infallible"
Sure you can. As the experienced engineer in that setting it is a great opportunity to teach the less experienced engineers. For example, "I have seen data loss on S3 at my last job. If X, Y, or Z happen then we will lose data. Is this data we can lose? And actually, it is pretty easy to replicate - I think we could get this done in a day or two."
It's also possible the response was "That's an excellent point! I think we should put that on the backlog. Since this data is already a backup of our DB data, I think we should focus on getting the feature out rather than replicating to GCP."
Those are two plausible conversations. Instead, what we have is "these arrogant 25 year olds that have 1-2 years of experience and know it all." That's a red flag to me.
>"Maybe they are getting tired of arrogant older programmers..."
And this is of course valid reason to ignore basic data preservation approaches.
Myself I am an old fart and I realize that I am too independent / cautious. But I see way too many young programmers who just read sales pitch and honestly believe that once data is on Amazon/Azure/Google it is automatically safe, their apps are automatically scalable, etc. etc.
Yes - the point of that line was to be ridiculous. Age has nothing to do with it. Anyone at any age can have good ideas and bad ideas. There are some really incredibly _older_ and highly experienced engineers out there. But there are others that think that experience means they are never wrong. Age has nothing to do with this - what is important is your past experience, your understanding of the problem and then context of the problem, and how you work with your team.
And again, my point isn't that you never need backups. My point is that it is entirely plausible that at that point in time backups from S3 weren't a priority.
Would you put the one and only copy of your family photo album up on AWS, where AWS going down would mean losing it? Because your customers' data is more important than that
AWS going down means I've lost it or temporarily lost access to it? Those are two very different scenarios. Of course S3 could lose data - a quick Google search shows it has happened to at least one account. My guess is it is rare enough that it seems like a reasonable decision to not prioritize backing up your S3 data. I'm not syaing "never ever backup S3 data" only that it seems reasonable to argue it's not the most important thing our team should be working on at this moment.
I have my family photos on a RAIDed NAS. It took me years to get that setup simply because there were higher priority things in my life. I never once thought "ahh I don't need backups of our data" I just had more important things to do.