Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This sorta reminds me of the lie that was pushed when the Snapdragon X laptops were being released last year. Qualcomm implied the NPU would be used for LLMs — and I bought into the BS without looking into it. I still use a Snapdragon laptop as my daily driver (it's fine) but for running models locally, it's still a joke. Despite Qualcomm's claims about running 13B parameter models, software like LM Studio only runs on CPU with NPU support merely "planned for future updates." XDA The NPU isn't even faster than the CPU for LLMs — it's just more power-efficient for small models, not the big ones people actually want to run. Their GPUs aren't much better for this purpose either. The only hope for LLMs is the Vulkan support on the Snapdragon X — which still is half-baked.


AFAIK Windows 11 does use the NPU to run Phi Silica language models and this is available to any app through some API. The models are quite small as you said though.


Agreed. Same with intel’s NPUs. I’ve been testing with my intel core evo 155x. The npu only runs int8 as well. At least, Intel has put in a decent amount of effort into the ecosystem

There are a couple ways to interface — DirectML by MS and Intel’s native api (they provide OpenVINO model conversion to convert normal Python ml models) I’ve tried ONNXRuntime conversions for both backends to little success. Additionally the OpenVINO model conversion seems to break the model if the model small enough.

OpenVINO model server seems pretty polished and has openapi compatible endpoints.


If you don’t mind me asking, what OS do you use on it?


I use Windows 11. Podman/WLS2 works way better than I thought it would. And when Gitbash was finally ported (officially) - that filled other gaps I was missing in my workflow. Windows ARM Python still is lacking all sorts of stuff, but overall I'm pretty productive on it.

I pre/ordered the Snapdragon X Dev kit from Qualcomm - but they ended up delivering a few units -- to only cancel the whole program. The whole thing turned out to be a hot-mess express saga. THAT computer was going to be my Debian rig.


AnythingLLM uses NPU


Could you provide a pointer to docs for this? It wasn't obvious from an initial read of their docs.


could find some about it on the 1.7.2 changelog here https://docs.anythingllm.com/changelog/v1.7.2




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: