Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?


There are several 70B+ models that are genuinely useful these days.

I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/


Depending on you use-case, I've been quite impressed with GPT-OSS 20B with high reasoning effort.

The 120B model is better but too slow since I only have 16GB VRAM. That model runs decent[1] on the Spark.

[1]: https://news.ycombinator.com/item?id=45576737


128gb unified memory is enough for pretty good models, but honestly for the price of this it is better just go go with a few 3090s or a Mac due to memory bandwidth limitations of this card


the question is: how does the prompt processing time on this compare to M3 Ultra because that one sucks at RAG even though it can technically handle huge models and long contexts...


Prompt processing time on Apple Silicon might benefit from making use of the NPU/Apple Neural Engine. (Note, the NPU is bad if you're limited by memory bandwidth, but prompt processing is compute limited.) Just needs someone to do the work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: