Apple CPUs come with their own GPUs on die and RAM in the chip package now. How ...

AnthonyMouse · 2025-12-04T06:08:10 1764828490

Apple puts the RAM in the chip package because they integrate the GPU, and then they want to be able to have multiple channels to feed the GPU without having that many slots. (Their entry level models also don't have any more memory bandwidth than normal PC laptops and there is no real reason they couldn't use a pair of SODIMMs.)

But low end iGPUs don't need a lot of memory bandwidth (again witness Apple's entry level CPUs) and integrating high end GPUs makes you thermally limited. There is a reason that Apple's fastest (integrated) GPUs are slower than Nvidia and AMD's fastest consumer discrete GPUs.

And even if you are going to integrate all the memory, as might be more justifiable if you're using HBM or GDDR, that only makes it easier to not integrate the CPU itself. Because now your socket needs fewer pins since you're not running memory channels through it.

Alternatively, there is some value in doing both. Suppose you have a consumer CPU socket with the usual pair of memory channels through it. Now the entry level CPU uses that for its memory. The midrange CPU has 8GB of HBM on the package and the high end one has 32GB, which it can use as the system's only RAM or as an L4 cache while the memory slots let you add more (less expensive, ordinary) RAM on top of that, all while using the same socket as the entry level CPU.

And let's apply some business logic to this: Who wants soldered RAM? Only the device OEMs, who want to save eleven cents worth of slots and, more importantly, overcharge for RAM and force you to buy a new device when all you want is a RAM upgrade. The consumer and, more than that, the memory manufacturers prefer slots, because they want you to be able to upgrade (i.e. to give them your money). So the only time you get soldered RAM is when either the device manufacturer has you by the short hairs (i.e. Apple if you want a Mac) or the consumers who aren't paying attention and accidentally buy a laptop with soldered RAM when their competitors are offering similar ones for similar prices but with upgradable slots.

So as usual, the thing preventing you from getting screwed is competition and that's what you need to preserve if you don't want to get screwed.

nextaccountic · 2025-12-04T10:26:16 1764843976

> integrating high end GPUs makes you thermally limited.

Even if you have a surface area equivalent to a high end cpu and high end gpu, combined in a single die?

AnthonyMouse · 2025-12-04T19:51:26 1764877886

A high end CPU (e.g. Threadripper) is 350W. A high end GPU (e.g. RTX 5090) is 575W. That's over 900W. You're past the point of die area and now you're trying to get enough airflow in a finite amount of space without needing five pounds of copper or 10000RPM fans.

Separate packages get you more space, separate fans, separate power connectors, etc.

In theory you could do the split in a different way, i.e. do SMP with APUs like the MI300X, and then you have multiple sockets with multiple heatsinks but they're all APUs. But you can see the size of the heatsink on that thing, and it's really a GPU they integrated some CPU cores into rather than the other way around. The power budget is heavily disproportionately the GPU. And it's Enterprise Priced so they get to take the "nobody here cares about copper or decibels" trade offs that aren't available to mortals.

markhahn · 2025-12-04T06:50:48 1764831048

You must surely know that Apple didn't originate any of that.

seanmcdirmid · 2025-12-04T07:52:17 1764834737

Yes, for sure. But it’s they’ve made it the norm, I don’t think I’m going to buy a more traditional computer again (unified RAM judt works so well for local AI), and the computer markers are going to adopt it completely eventually.

angoragoats · 2025-12-04T14:06:48 1764857208

Hard disagree on it working well for local AI - all the memory bandwidth in the world doesn’t matter when the GPU it’s connected to is middling in performance compared to dedicated options. Give me one (or several) 3090/4090/5090 any day of the week over a Mac.

seanmcdirmid · 2025-12-04T16:11:36 1764864696

I’ve got an M3 Max with 64G, and can run larger models well than a single 5090. Yes, the GPU isn’t as fast, but I have a lot more memory and my GPUs still don’t suck that badly.

angoragoats · 2025-12-04T22:48:09 1764888489

You illustrated my point exactly: yes, a single 32GB 5090 has half the memory of your Mac. But two of them (or three 3090/4090s) have the same total memory as your Mac, are in the same ballpark in price, and would be several times faster at running the same model as your Mac.

And before you bring up the “efficiency” of the Mac: I’ve done the math, and between the Mac being much slower (thus needing more time to run) and the fact that you can throttle the discrete GPUs to use 200-250W each and only lose a few percent in LLM performance, it’s the same price or cheaper to operate the discrete GPUs for the same workload.

seanmcdirmid · 2025-12-04T23:48:21 1764892101

I don't know. Can you bring your GPUs on an inter-continental plane trip and play with LLMs on the plane? It isn't really that slow for 70B 4-q models. These are very good CPU/GPUs, and they are only getting better.

angoragoats · 2025-12-05T00:07:13 1764893233

Sure, the GPUs sit in my basement and I can connect to them from anywhere in the world.

My point was not that “it isn’t really that slow,” my point is that Macs are slower than dedicated GPUs, while being just as expensive (or more expensive, given the specific scenario) to purchase and operate.

And I did my analysis using the Mac Studio, which is faster than the equivalent MBP at load (and is also not portable). So if you’re using a MacBook, my guess is that your performance/watt numbers are worse than what I was looking at.

seanmcdirmid · 2025-12-05T01:12:15 1764897135

The whole point of having it local is not to use the network, or not need it, or not needing to jump the GFW when you are in China.

Ultra is about 2X of the power of a Max, but the Max itself is pretty beefy, and it has more than enough GPU power for the models that you can fit into ~48GB of RAM (what you have available if you are running with 64GB of memory).

angoragoats · 2025-12-05T12:13:54 1764936834

If you travel to China, sure, what I’m talking about probably won’t work for you.

In pretty much any other situation, using dedicated GPUs is 1) definitely faster, like 2x the speed or more depending on your use case, and 2) the same cost or possibly cheaper. That’s all I’m saying.