Right, but if the port achieves performance gains over GGML, which is already highly performant, that’s a. Wild b. a signal to move further GGML development into Rust, no?
As far as I understand, only the "driver" code is in rust. Everything else is just C++ compiled to WASM. Maybe it's slightly better to have the driver code be in rust than python or scheme or whatever, but I imagine C++ would be basically equivalent (and.... you wouldn't have to go through the trouble of compiling to WASM which likely loses significant performance).
That's what I find weird here. The bit of the code written in rust is almost comically tiny, and the rest is just C++ you compiled to WASM which someone else already wrote. I think comparing this to a Python wrapper for the same code would produce very minimal difference in performance, because the majority goes into performance and formatting the prompt string really isn't that complex of a task. I just don't see what advantage Rust produces here other than the fact that it's a language you can compile to WASM so that you have one binary.
ML has extremely predictable and heavily optimized routines. Languages that can target hardware ISA all tend to have comparable perf and there’s no reason to think Rust would offer much.
No it's not. This does nothing to minimize the size of the models which inference on being run on. It's cool for edge applications, kind of. And Rust is already a go to tool for edge.
> this is a “holy shit” moment for Rust in AI applications
Yeah because I realized the 2MB is just a wrapper that reads stdin and offloads everything to wasi-nn API.
> The core Rust source code is very simple. It is only 40 lines of code. The Rust program manages the user input, tracks the conversation history, transforms the text into the llama2’s chat template, and runs the inference operations using the WASI NN API.
You can do the same using Python with fewer lines of code and maybe smaller executable size.
EDIT:
Looks like I’m wrong, but I appreciate getting schooled by all the HNers with low-level expertise. Lots to go and learn about now.