Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The MoE version with 3b active parameters

~34 tok/s on a Radeon RX 7900 XTX under today's Debian 13.



And vmem use?


~18.6 GiB, according to nvtop.

ollama 0.6.6 invoked with:

    # server
    OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q8_0 ollama serve

    # client
    ollama run --verbose qwen3:30b-a3b
~19.8 GiB with:

    /set parameter num_ctx 32768


Very nice, should run nicely on a 3090 as well.

TY for this.

update: wow, it's quite fast - 70-80t/s on LM Studio with a few other applications using GPU.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: