Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?
There are several 70B+ models that are genuinely useful these days.
I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/
128gb unified memory is enough for pretty good models, but honestly for the price of this it is better just go go with a few 3090s or a Mac due to memory bandwidth limitations of this card
the question is: how does the prompt processing time on this compare to M3 Ultra because that one sucks at RAG even though it can technically handle huge models and long contexts...
Prompt processing time on Apple Silicon might benefit from making use of the NPU/Apple Neural Engine. (Note, the NPU is bad if you're limited by memory bandwidth, but prompt processing is compute limited.) Just needs someone to do the work.