So, to recap: - Their database permissions changed unexpectedly (??) - This caus...

tptacek · 2025-11-18T23:59:41 1763510381

People jump to say things like "where's the rollback" and, like, probably yeah, but keep in mind that speculative rollback features (that is: rollbacks built before you've experienced the real error modes of the system) are themselves sources of sometimes-metastable distributed system failures. None of this is easy.

0xbadcafebee · 2025-11-19T04:36:44 1763527004

How about where's the most basic test to check if your config file will actually run at all in your application? It was a hard-coded memory limit; a git-hook test suite run a MacBook would have caught this. But nooo, let's not run the app for 0.01 seconds with this config before sending it out to determine the fate of the internet?

This is literally the CrowdStrike bug, in a CDN. This is the most basic, elementary, day 0 test you could possibly invent. Forget the other things they fucked up. Their app just crashes with a config file, and nobody evaluates it?! Not every bug is preventable, but an egregious lack of testing is preventable.

This is what a software building code (like the electrical code's UL listings that prevent your house from burning down from untested electrical components) is intended to prevent. No critical infrastructure should be legal without testing, period.

paulddraper · 2025-11-19T00:41:08 1763512868

Looks like you have the perfect window to disrupt them with a superior product.

mercnz · 2025-11-19T01:20:54 1763515254

just before this outage i was exploring bunnycdn as the idea of cloudflare taking over dns still irks me slightly. there are competitors. but there's a certain amount of scale that cloudflare offers which i think can help performance in general. that said in the past i found cloudflare performance terrible when i was doings lots of testing. they are predominantly a pull based system not a push, so if content isn't current the cache miss performance can be kind of blah. i think their general backhaul paths have improved, but at least from new zealand they used to seem to do worse than hitting a los angeles proxy that then hits origin. (although google was in a similar position before, where both 8.8.8.8 and www.google.co.nz/.com were both faster via los angeles than via normal paths - i think google were doing asia parent, like if testing 8.8.8.8 misses it was super far away). i think now that we have http/3 etc though that performance is a bit simpler to achieve, and that ddos, bot protection is kind of the differentiator, and i think that cloudflare's bot protection may work reasonably well in general?