Is 128 GB of unified memory enough? I've found that the smaller models are great...

simonw · 2025-10-15T06:10:50 1760508650

There are several 70B+ models that are genuinely useful these days.

I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/

magicalhippo · 2025-10-15T07:25:30 1760513130

Depending on you use-case, I've been quite impressed with GPT-OSS 20B with high reasoning effort.

The 120B model is better but too slow since I only have 16GB VRAM. That model runs decent[1] on the Spark.

[1]: https://news.ycombinator.com/item?id=45576737

cocogoatmain · 2025-10-15T07:30:31 1760513431

128gb unified memory is enough for pretty good models, but honestly for the price of this it is better just go go with a few 3090s or a Mac due to memory bandwidth limitations of this card

behnamoh · 2025-10-15T06:46:09 1760510769

the question is: how does the prompt processing time on this compare to M3 Ultra because that one sucks at RAG even though it can technically handle huge models and long contexts...

zozbot234 · 2025-10-15T08:43:49 1760517829

Prompt processing time on Apple Silicon might benefit from making use of the NPU/Apple Neural Engine. (Note, the NPU is bad if you're limited by memory bandwidth, but prompt processing is compute limited.) Just needs someone to do the work.