The problem is that at that scale, the alternative is building your own data centers. You'd probably want at least 2 in the US, 2 in Europe, 2 in Asia, maybe 1 in Africa and 1 in LATAM. So 8-10, and you need at least half of them ready "on time."
What does "on time" mean? You'll need to negotiate with local authorities, some friendly, some not. Data centers aren't exactly popular neighbors these days. Then negotiate with the local power utility. Fingers crossed the political landscape doesn't shift and your CEO doesn't sign a contract with an army using your product to pick bombing targets, because you'll watch those permits evaporate fast.
Then there's sourcing: CPUs, GPUs, memory, networking. You need all of it. Did you know the lead time for an industrial power transformer is 5+ years? Don't get me started on the water treatment pumps and filters you can't even get permitted without. What will you do in the meantime ? You surely aren't gonna get preferential treatment from AWS / Google / ... if they know you are moving away anyway. Your competition will.
The risk and complexity are just too big. AI/LLM is already an incredibly complex and brittle environment with huge competition. Getting distracted building data centers isn't enticing for these companies, it's a death sentence.
For AI inference you don't need to geographically distribute your data centers. Latency, throughput, and routes don't matter here. When it's 10 seconds for the first token and then a 1KB/sec streamed response, whatever is fine. You can serve Australia from the US and it'll barely matter. You can find a spot far outside populated areas with cheap power, available water, and friendly leadership, then put all of your data centers there. If you're worried about major disasters, you can pick a second city. You definitely don't need a data center in every continent.
You're not wrong about the rest but no AI company would ever build a data center in every continent for this, even if they were prepared to build data centers. AI inference isn't like general purpose hosting.
>Latency, throughput, and routes don't matter here. When it's 10 seconds for the first token and then a 1KB/sec streamed response, whatever is fine. You can serve Australia from the US and it'll barely matter.
This may be true for simpler cases where you just stream responses from a single LLM in some kind of no-brain chatbot. If the pipeline is a bit more complex (multiple calls to different models, not only LLMs but also embedding models, rerankers, agentic stuff, etc.), latencies quickly add up. It also depends on the UI/UX expectations.
Funny reading this, because the feature I developed can't go live for a few months in regions where we have to use Amazon Bedrock (for legal reasons), simply because Bedrock has very poor latency and stakeholders aren't satisfied with the final speed (users aren't expected to wait 10-15 seconds in that part of the UI, it would be awkward). And a single roundtrip to AWS Ireland from Asia is already like at least 300ms (multiply by several calls in a pipeline and it adds up to seconds, just for the roundtrips), so having one region only is not an option.
Funny though, in one region we ended up buying our own GPUs and running the models ourselves. Response times there are about 3x faster for the same models than on Bedrock on average (and Bedrock often hangs for 20+ seconds for no reason, despite all the tricks like cross-region inference and premium tiers AWS managers recommended). For me, it's been easier and less stressful to run LLMs/embedders/rerankers myself than to fight cloud providers' latencies :)
>then put all of your data centers there
>You definitely don't need a data center in every continent.
Not always possible due to legal reasons. Many jurisdictions already have (or plan to have) strict data processing laws. Also many B2B clients (and government clients too), require all data processing to stay in the country, or at least the region (like EU), or we simply lose the deals. So, for example, we're already required to use data centers in at least 4 continents, just 2 more continents to go (if you don't count Antarctica :)
Sounds like you're betting that the performance users experience today will be the same as the performance they'll expect tomorrow. I wouldn't take that bet.
You mean that if you were Anthropic, you'd build the data centers on every continent? Can you explain your reasoning?
We're talking about billions of dollars of extra capex if you take the "let's build them everywhere" side of the bet instead of "let's build them in the cheapest possible place" side. It seems to me that you'd have to be really sure that you need the data center to be somewhere uneconomical. I think if you did build them in the cheap place, it's a safe bet that you'll always have at least enough latency-insensitive workloads to fill it up. I doubt that we would transition entirely to latency-sensitive workloads in the future, and that's what would have to happen for my side of the bet to go wrong. The other side goes wrong if we don't see a dramatic uptick in latency-sensitive inference workloads. As another comment pointed out, voice agents are the one genuinely latency-sensitive cloud inference workload we have right now; they do need low latency for it. Such workloads exist, but it's a slim percentage so far.
I believe I'm taking the safe bet that lets Anthropic make hay while the sun shines without risking a major misstep. Nothing stops them from using their own data centers for cheap slow "base load" while still using cloud partners for less common specialized needs. I just can't see why they would build the international data centers to reduce cloud partner costs on latency-sensitive workloads before those workloads actually show up in significant numbers.
They want it, sure. Customers want everything if it's free, but this is about what they value with their money. In this thought experiment, you're Anthropic, not the customer. You're making a choice that's best for Anthropic. Will Anthropic lose customers because the latency is higher? No way. Customers want low cost and lots of usage more than they want low latency. In a cutthroat race to the bottom, there's no room to "give away" massively expensive freebies like a data center near every population center when the customer doesn't value those extras with actual money. It's the same reason we all tolerate the relatively slow batched token generation rate--the batching dramatically lowers the cost, and we need low cost inference more than we want fast generation. If the cost goes up we'll actually leave, for real.
After the initial announcement of "fast mode" in Claude Code, did you ever hear about anyone using it for real? I didn't. Vanishingly few people are willing to pay extra for faster inference.
Remember that the time-to-first-token is dominated by the time to process the prompt. It's orders of magnitude more latency than the network route is adding. An extra 200 milliseconds of network delay on a 5-10 second time-to-first-token is not even noticeable; it's within the normal TTFT jitter. It would be foolish to spend billions of dollars to drop data centers around the world to reduce the 200 milliseconds when it's not going to reduce the 5-10 seconds. Skip the exotic locales and put your data centers in Cheap Power Tax Haven County, USA. Perhaps run the numbers and see if Free Cooling City, Sweden is cheaper.
They’re unwilling to pay for fast mode because of the current step function price increase once you hit your quota. It’s a psychological effect. Because most shops I know in the US currently paying $125/mo per seat for Claude would happily - HAPPILY - pay 2x, and begrudgingly pay 10x that amount for the same service. If fast mode was priced 25% or 50% more they’d happily pay for that too. But it’s just not priced that way currently with weird growth subsidization & psychology.
The only AI use case that cares about latency is interactive voice agents, where you ideally want <200ms response time, and 100ms of network latency kills that. For coding and batch job agents anything under 1s isn't going to matter to the user.
tbh, that's a good point about the voice agents that I hadn't considered. I guess there are some latency-sensitive inference workloads. Thanks for pointing that out.
A customer service chatbot can require more than one LLM call per response to the point that latency anywhere in the system starts to show up as a degraded end-user experience.
Easy solution - use hyperscalers with super expensive API charge only when latency really matters. Otherwise build your own DC. Easy to expect customers don't care latency that much over money.
Large data centers consume as much power as a small city. The location decision is about being able to connect to a power grid that is ready to supply that.
Evaporative cooling also needs steady water supply. There are data centers which don’t operate on evaporative cooling but it’s more equipment intensive and expensive.
Latency doesn’t matter. You can get fast enough internet connected to these sites much more easily than finding power.
Location matters for disaster recovery, if they want to survive WWIII. Though I think Data Sovereignty is probably a bigger thing, especially if they're going to be selling to governments around the world.
* not every task is waiting on the inference. lowering latency on other, serial tasks, can still have a noticable effect. Login, mcp queries, etc.
* data transit across the world can be very slow when there's network issues (a fiber is cut somewhere, congestion, bgp does it's thing, etc). having something more local can mitigate this
* several countries right now have demented leaders with idiotic cult-like followers. Best not to put all your eggs in those baskets.
* wars, earthquakes, fires, floods, and severe weather rarely affect the whole planet at once, but can have rippling effects across a continent.
And frankly, the real question isn't "why spread out the DCs?", its "what reason is there to put them close to each other?".
Btw where does this obsession with datacenters come from? If you can tolerate ~150ms ping (which chatbots certainly can, as their internal processing can take much longer), you can serve US and Europe from a single US location, and the whole planet if you can tolerate ~300ms (Asian websites are usually very slow to load for me, I think it has to do with the way the internet is set up, not any physical limitations, but mostly commercial ones, as Western companies rarely have good market penetration in Asia)
Maybe for right now, but even in the very near future it seems like data center expertise would absolutely be a core competency of any AI leaders.
Heck, look at Facebook. Granted, they got started slightly before AWS, but not by much. Owning all of their own data centers is a huge competitive advantage for them, and unlike most of the other hyperscalers they don't sell compute to other companies (AFAIK).
Again, the commitment is for $100 billion in spend. Building lots of data centers for a lot cheaper than that price should absolutely be doable. Also, geographic distribution isn't nearly as important for AI companies given the way LLMs work. The primary benefit of being close to your data center is reduced latency, but if you think about your average chatbot interface, inference time absolutely swamps latency, so it's not as big a deal. Sure, you'd probably need data centers in different locales for legal reasons, and for general diversification, but, one more time, $100 billion should buy a lot of data centers.
It's interesting that you mention Facebook. They have a ton of their own data centers and yet they are now also spending tens of billions on cloud. It's not that easy to build hundreds of data centers on short notice.
Translation: Antropic never intends to spend $100 billion on AWS.
Every single argument you've brought up is irrelevant in the face of billions of dollars. If you intend to consume $100 billion dollars in data center infrastructure, you're going to find a way to accomplish it while cutting out the middlemen.
Meanwhile if you're flaky and never intend to spend that money, you're going to come up with a way to pay someone else to deal with those problems and quit paying the moment they don't.
You'd never do both at the same time. You'd never commit your money and give them control over your business critical infrastructure.
Hence the deal is a sham. The $100 billion are a lie. Thank you for telling us.
Take the approach Geohot is suggesting. Take a shipping container, make a standard layout, cooling and compute load. Find a cheap source of electricity.. Place it and have compute.
It has been done... We used to get our POP gear built out from Dell (?) in shipping containers - pre-racked, wired, and cooled - just add network/power feeds. We'd have them dropped places we needed more capacity but there wasn't space available in the DC.
not sure what you are describing, however a random item is that in 2026 low-tech Chile is building sixty datacenters in or near Santiago, in the business news.
I have found out that the main phone providers (Apple, Google, Samsung) have extremely long support period. I really don't get the "planned obsolescence" thing.
As an example, in Jan 2026, Apple published iOS 12.5.8 which provides updates for iPhone 5s which released in Sept 2013. That's 12.5 years ago. The equivalent would be to connect to the internet using ADSL in Jan 2000 with your IBM PS/2 rocking in intel 8086, 512 kb of RAM and expecting an update for your DOS operating system.
>As an example, in Jan 2026, Apple published iOS 12.5.8 which provides updates for iPhone 5s which released in Sept 2013. That's 12.5 years ago. The equivalent would be to connect to the internet using ADSL in Jan 2000 with your IBM PS/2 rocking in intel 8086, 512 kb of RAM and expecting an update for your DOS operating system.
The updates for ios 12 are all security updates, not feature updates, so your comparison to "connect to the internet using ADSL in Jan 2000 with your IBM PS/2 rocking in intel 8086" doesn't really make sense. The phones stuck on ios 15 are basically unusable because many apps don't support it anymore. At best you can download an older version from a few years ago, but that depends on whether the backend servers were updated. Apps that insist you use the latest version (eg. banking/finance apps) basically unusable.
Believe it or not, "apps" are an important "feature" of a smartphone, even if it's not theoretically bundled with it. Moreover it's not just banking apps, those are just the first ones to go, but any that don't keep backend compatibility will eventually break.
The entire point of the cellphone is that third party apps are required to live a modern life. If I cannot run the apps required to pay for a parking spot or perform a 2FA ritual then there’s really no point in even having a phone. The first party software isn’t compelling enough to justify the pocket space.
You could always keep your phone and get a second dirt-cheap phone just for the 2FA (or use your banks' non-phone 2FA methods). But if we take your requirement that one phone should be able to do everything that new phones can do, it's somewhat tautological that you have to replace your phone frequently to stay on the cutting edge.
IBM PC DOS 2000 was a thing that was published and sold. It would have ran fine on a system similar to what you describe. It addressed the only pressing thing in that space at that time that PC DOS 7 did not: Y2K compliance.
(I never had a PS/2, or ADSL, but I was goofing around with a low-memory 8088 box back then for fun. It had no hard drive. It bootstrapped from floppy, loaded the rest over the LAN with its built-in 10base2 Ethernet jack from my Linux box, and connected to dual-channel ISDN for Internet access. It worked. It even had a graphical web browser.
Being clever with an old iPhone is a very different thing.)
How do you handle SSL pinning ? Most of the apps I interact with have some sort of SSL pinning, which is the hard part to circumvent. I tried Kampala but got stuck at the usual place; as soon as I enable it, chatGPT stops working. Most of my iPhone apps stop responding etc.
I would love to try using this tool to build an agent that can simply subscribe me to my gym lessons instead of me having to go on the horrible app. But even that relatively simple (iOS) app stopped working as soon as I enabled the proxy.
Unfortunately we can’t do much around SSL pinning yet. Not sure how deep you want to go, but there are several Frida scripts that patch common pinning implementations.
I also think mitmproxy (open source) has an option to spin up a virtual Android device that can bypass pinning via AVD. I have not tested how reliable it is though.
FWIW, it could also be a cert trust issue. I would try a quick Safari search to confirm the cert is fully trusted. ChatGPT is pinned, but the gym app makes me think it might be a trust or config issue on your device.
Happy to take a look as well. Email me at alex at zatanna dot ai.
ssl pinning on ios is a real blocker for any tool working at the network layer - the reliable path is going through the native xcuitest layer instead of intercepting traffic. we hit exactly this building mobile qa support in autonoma (https://www.getautonoma.com)
> why one person potentially being responsible for hundreds or thousands of deaths is acceptable
I am not sure who exactly is that one person ? Is it Altman, who is according to many people not that knowledgeable in AI in the first place; the scientist who found a breakthrough (who is it ?); is it the president of the United States who is greenlighting the strikes; the general who is choosing the target (based on AI suggestions); the missile designer; the manufacturer; the pilot who flew the plane ?
I get the point of concentrating power in fewer hands, but the whole "all the problems of this world are caused by an extremely narrow set of individuals" always irks me. Going as far as saying there is just one is even mor ludicrous.
I’m fine with holding them all accountable to varying degrees. For example, yes, ultimately the president is responsible, but so is the person who dropped bombs instead of refusing an illegal order; just like the street dealer, gang banger, trafficker, and cartel boss are all guilty of all of their various crimes.
What do you find difficult to understand about that?
Ah the old 'everyone is responsible so nobody is responsible' canard.
I will give you a helpful rule of thumb: when in doubt the guy with a bank account larger than the total lifetime income of hundreds of thousands of people is probably the one to blame.
Ah the old ‘in case of doubt just go after the rich guy’. That makes stuff simple doesn’t it ?
You can establish responsibilities just by counting the number of zeroes in a bank account.
On top of this, it works for everything: the same dude is responsible for wars, the climate, world hunger, child cancer and your bathroom mirror being fogged this morning.
He lost access to the wallet either by mistake (never even saved the key) or because he willingly destroyed the key for philosophical reasons. Or he is just dead.
You can already do that today by hiring a security researcher. I can guarantee you that Apple has access to people of a higher caliber than my startup.
I could see a world where 1 year from now I can have glassing do a full sweep of my codebase for a given price (say: $10k). Running that once a year is within my means and would make my software much more secure than it is today.
Yeah but even Carlini who is a good security researcher said he has found more valid vulnerabilities in the last week than his entire career before this. That sounds like it’s clearly better/faster/cheaper than a human security researcher that would cost $300,000 a year.
I spend well over that of my employers money on pentesting every year. I’m absolutely certain Claude could perform as good or better a job using what’s available today.
It had crossed my mind that an AI agent pentester would be an interesting product to build. Once again though, the labs are just going to build it because it’s a thin thin wrapper.
Beyond existing software with vulnerabilities, the really important aspect of this for Anthropic et al is that the gigatons of code that are being generated every day needs to be secured.
There are quite a few such startups already out there. Results are mixed so far. Though I believe they get much better over the coming months and years.
Bait what exactly ? Getting the user to type "yes" ? Great accomplishment.
Sometimes I want the extra paragraph, sometimes I don't. Sometimes I like the suggested follow up, sometimes I don't. Sometimes I have half an hour in front of me to keep digging into a subject, sometimes I don't.
Why should the LLM "just write the extra paragraph" (consuming electricity in the process) to a potential follow up question a user might, or might not, have ? If I write a simple question I hope to get a simple answer, not a whole essay answering stuff I did not explicitly ask for. And If I want to go deeper, typing 3 letters is not exactly a huge cost.
I’m not privy to their data on what this does to engagement, but intuitively it seems like the extra inference/token cost this incurs doesn’t align with their current model.
If they were doing it to API customers, sure, but getting the free or flat-rate customers to use more tokens seems counterproductive.
We’ll see how this plays out. It’s a turbocharged version of enshittification, at a time when other models are showing stronger growth in B2B and other valuable markets.
I canceled my ChatGPT subscription and jumped to Claude, not for silly political theater, but just because the product was better for professional use. Looking at data from Ramp and others, I’m not alone.
So human become just a provider of those 6 digits code ? That’s already the main problem i have with most agents: I want them to perform a very easy task: « fetch all recepts from website x,y and z and upload them to the correct expense of my expense tracking tool ». Ai are perfectly capable of performing this. But because every website requires sso + 2 fa, without any possibility to remove this, so i effectively have to watch them do it and my whole existence can be summarized as: « look at your phone and input the 6 digits ».
The thing i want ai to be able to do on my behalf is manage those 2fa steps; not add some.
This is where the Claw layer helps — rather than hoping the agent handles the interruption gracefully, you design explicit human approval gates into the execution loop. The Claw pauses, surfaces the 2FA prompt, waits for input, then resumes with full state intact. The problem IMTDb describes isn't really 2FA, it's agents that have a hard time suspending and resuming mid-task cleanly. But that is today, tomorrow, that is an unknown variable.
It's technically possible to use 2FA (e.g. TOTP) on the same device as the agent, if appropriate in your threat model.
In the scenario you describe, 2FA is enforcing a human-in-the-loop test at organizational boundaries. Removing that test will need an even stronger mechanism to determine when a human is needed within the execution loop, e.g. when making persistent changes or spending money, rather than copying non-restricted data from A to B.
Reading through the discussion I was also thinking of the other fly.io blog post around their setup with macaroon tokens and being able to quite easily reduce the blast radius of them by adding more caveats. Feels like you could build out some kind of capability system with that that might mitigate some risks somewhat.
Regarding sexism; most tournaments in Chess (including the world championship) are fully open and are thus gender netral: anyone can participate regardless of sex/gender and will compete on equal footing.
Women only categories have been created to give women visibility because they mostly were not able to reach advanced levels in the open format.
Some women choose to compete with men (Judit Polgár being a somewhat recent example) but most go straight to the women only tournaments to have a shot.
The men vs women « bias » is not unproven, they litterally had to create entire categories of competiton to account for it.
That’s true for “tips and tricks” knowledge like “which model is best today” or “tell the model you’ll get fired if the answer is wrong to increase accuracy” that pops up on Twitter/X. It’s fleeting, makes people feel like “experts”, and doesn’t age well.
On the other hand, deeply understanding how models work and where they fall short, how to set up, organize, and maintain context, and which tools and workflows support that tends to last much longer. When something like the “Ralph loop” blows up on social media (and dies just as fast), the interesting question is: what problem was it trying to solve, and how did it do it differently from alternatives? Thinking through those problems is like training a muscle, and that muscle stays useful even as the underlying technology evolves.
> what problem was it trying to solve, and how did it do it differently from alternatives?
Sounds to me like accidental complexity. The essential problem is to write good code for the computer to do it's task?
There's an issue if you're (general you) more focused on fixing the tool than on the primary problem, especially when you don't know if the tool is even suitable,
It does seem like things are moving very quickly even deeper than what you are saying. Less than a year ago langchain, model fine tuning and RAG were the cutting edge and the “thing to do”.
Now because of models improving, context sizes getting bigger, and commercial offerings improving I hardly hear about them.
What does "on time" mean? You'll need to negotiate with local authorities, some friendly, some not. Data centers aren't exactly popular neighbors these days. Then negotiate with the local power utility. Fingers crossed the political landscape doesn't shift and your CEO doesn't sign a contract with an army using your product to pick bombing targets, because you'll watch those permits evaporate fast.
Then there's sourcing: CPUs, GPUs, memory, networking. You need all of it. Did you know the lead time for an industrial power transformer is 5+ years? Don't get me started on the water treatment pumps and filters you can't even get permitted without. What will you do in the meantime ? You surely aren't gonna get preferential treatment from AWS / Google / ... if they know you are moving away anyway. Your competition will.
The risk and complexity are just too big. AI/LLM is already an incredibly complex and brittle environment with huge competition. Getting distracted building data centers isn't enticing for these companies, it's a death sentence.
reply