Can someone explain why we measure these datacenters in Gigawatts rather than something that actually measures compute like flops or whatever the AI equivalent of flops is?
To put it another way, I don't know anything but I could probably make a '1 GW' datacenter with a single 6502 and a giant bank of resistors.
Yes! The varying precisions and maths feels like just the start!
Look at next gen Rubin with it's CPX co-processor chip to see things getting much weirder & more specialized. There for prefilling long contexts, which is compute intensive:
> Something has to give, and that something in the Nvidia product line is now called the "Rubin" CPX GPU accelerator, which is aimed specifically at parts of the inference workload that do not require high bandwidth memory but do need lots of compute and, increasingly, the ability to process video formats for both input and output as part of the AI workflow.
To confirm what you are saying, there is no coherent unifying way to measure what's getting built other than by power consumption. Some of that budget will go to memory, some to compute (some to interconnect, some to storage), and it's too early to say what ratio each may have, to even know what ratios of compute:memory we're heading towards (and one size won't fit all problems).
Perhaps we end up abandoning HBM & dram! Maybe the future belongs to high bandwidth flash! Maybe with it's own Computational Storage! Trying to use figures like flops or bandwidth is applying today's answers to a future that might get weirder on us. https://www.tomshardware.com/tech-industry/sandisk-and-sk-hy...
As a reference for anyone interested - the cost is estimated to be $10 billion for EACH 500MW data center - this includes the cost of the chips and the data center infra.
Mh, in my recently slightly growing, but still tiny experience with HW&DC-Ops:
You have a lot more things in a DC than just GPUs consuming power and producing heat. GPUs are the big ones, sure, but after a while, switches, firewalls, storage units, other servers and so one all contribute to the power footprint significantly. A big small packet high throughput firewall packs a surprisingly high amount of compute capacity, eats a surprising amount of power and generates a lot of heat. Oh and it costs a couple of cars in total.
And that's the important abstraction / simplification you get when you start running hardware at scale. Your limitation is not necessarily TFlops, GHz or GB per cubic meter. It is easy to cram a crapton of those into a small place.
The main problem after a while is the ability to put enough power into the building and to move the heat out of it again. It sure would be easy to put a lot of resistors into a place to make a lot of power consumption. Hamburg Energy is currently building just that to bleed off excess solar power into the grid heating.
It's problematic to connect that to the 10kv power grid safely and to move the heat away from the system fast.
My understanding is that there is no universal measure of compute power that applies across different hardware and workloads. You can interpret the power number to mean something close to the maximum amount of compute you can get for that power at a given time (or at least at time of install). It also works across geographies, cooling methods, etc. It covers all that.
If you think about it like refining electricity. A data center has a supply of raw electricity, and a capacity for how must waste (heat) it can handle. The quality of the refining improving over time doesn't change the supply or waste capacity of the facility.
Because, to us tech nerds, GPUs are the core thing. With a PM hat on, it's the datacenter in toto. Put another way: how can we measure in flops? By the time all this is built out we're on the next gen of cards.
Assuming a datacenter is more or less filled with $current_year chips, the number of of flops is kind of a meaninglessly large number. It's big. How big? Big enough it needs a nuclear power plant to run.
Not to mention it would assume that number wouldn't change...but of course it depends entirely on what type of compute is there as well as the fact that every few years truckloads of hardware gets replaced and the compute goes up.
It simplifies marketing. They probably don't really know how much Flops or anything else they will end up anyway. So gigawatts is nice way to look big.
To put it another way, I don't know anything but I could probably make a '1 GW' datacenter with a single 6502 and a giant bank of resistors.