The model weights are 70GB (Hugging Face recently added a file size indicator - ...

a_e_k · 2025-09-22T19:34:58 1758569698

That's at BF16, so it should fit fairly well on 24GB GPUs after quantization to Q4, I'd think. (Much like the other 30B-A3B models in the family.)

I'm pretty happy about that - I was worried it'd be another 200B+.

numpad0 · 2025-09-23T15:02:51 1758639771

So like, 1x32GB is all you need for quite a while? Scrolling through the Web makes me feel like I'm out unless I have minimum 128GB of VRAM.

zenmac · 2025-09-22T22:13:09 1758579189

are there any that would run on 16GB Apple M1?

bigyabai · 2025-09-22T22:41:53 1758580913

Not quite. The smallest Qwen3 A3B quants are ~12gb and use more like ~14gb depending on your context settings. You'll thrash the SSD pretty hard swapping it on a 16gb machine.

growthwtf · 2025-09-22T19:32:16 1758569536

A fun project for somebody who has more time than myself would be to see if they can get it working with the new Mojo stuff from yesterday for Apple. I don't know if the functionality would be fully baked out enough yet to actually do the port successfully, but it would be an interesting try.

wsintra2022 · 2025-09-22T23:44:23 1758584663

New Mojo stuff from Apple?

wsintra2022 · 2025-09-22T23:45:47 1758584747

Nvm found it https://news.ycombinator.com/item?id=45326388

varispeed · 2025-09-22T21:59:10 1758578350

Would it run on 5090? Or is it possible to link multiple GPUs or has NVIDIA locked it down?

axoltl · 2025-09-22T22:33:42 1758580422

It'd run on a 5090 with 32GB of VRAM at fp8 quantization which is generally a very acceptable size/quality trade-off. (I run GLM-4.5-Air at 3b quantization!) The transformer architecture also lends itself quite well to having different layers of the model running in different places, so you can 'shard' the model across different compute nodes.

dcreater · 2025-09-22T20:32:58 1758573178

is there an inference engine for this on macos?

simonw · 2025-09-22T22:03:44 1758578624

Not yet as far as I can tell - might take a while for someone to pull that together given the complexity involved in handling audio and image and text and video at once.