While I don't have anything smart to say about stuff like this, I'd love to see ...

unistdh · on April 27, 2014

A firm going bankrupt due to a bad code/config? Do you have a link to the article you're referring to? Sounds interesting...

keehun · on April 27, 2014

$400 million down the drain

http://dougseven.com/2014/04/17/knightmare-a-devops-cautiona...

frugalfirbolg · on April 27, 2014

Just the fact that they didn't have a way to record and verify whether the deployment was done properly boggles my mind. When I worked at a bank we had package management to do deployments, a separate tool for taking inventory of installed software (in case of users managing to sneak third party programs on to their system), and on top of that a web framework for tracking milestones during projects that allowed for manual entry by technicians and automated input from scripts so tasks that had to be done by hand like replacing hardware could be coordinated with build scripts and management could monitor the whole thing from a dashboard.

rietta · on April 27, 2014

Wow! Bookmarking that one. What a great cautionary tale both for developers and devops. I may well need to use that as a teaching aide. Though a security principle, I cannot tell you how many times I have to point of the need for defense in depth in the design of software.

Hoff · on April 27, 2014

A reference to Knight Capital Group, most likely.

yeukhon · on April 27, 2014

Maybe. They have been working on migrating some of the repositories over to new hardware lately.

kevinburke · on April 27, 2014

I doubt they are pushing new configuration at 6am on a Sunday morning.

keehun · on April 27, 2014

I doubted that Github would push new (bad) production code mid-day unannounced, and it still happened. To be fair, Github I think pushes new production code several times a day every day?

kevinburke · on April 27, 2014

If you are pushing new configuration at 6am on a Sunday, and things immediately stop working, you revert the configuration change.

akerl_ · on April 27, 2014

If you work in an environment where such rollbacks are that simple, you're in the rare minority. The reality of working with large-scale distributed systems is that rolling back becomes much more complicated. Push out new code and the accompanying DB schema change? Good luck rolling back to the older schema when you find the bug.

keehun · on April 27, 2014

I don't think it's that simple, especially if the new bad configuration has already run amok with the machine/data. Backups are a thing, but at that scale, is probably not completely current every minute.

rietta · on April 27, 2014

I do not know about that. In a lot of environments deployments on the weekend are common, even in full devops/automatic deploy situations.