Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, this is a “holy shit” moment for Rust in AI applications if this works as described. Also, so long Mojo!

EDIT:

Looks like I’m wrong, but I appreciate getting schooled by all the HNers with low-level expertise. Lots to go and learn about now.



It's "just" a port of GGML (written in C++) to wasm with some additional Rust code.


Right, but if the port achieves performance gains over GGML, which is already highly performant, that’s a. Wild b. a signal to move further GGML development into Rust, no?


How would wasm/rust be more performant over c++? I’m not sure the wasm version can take advantage of avx/metal.

Edit: the wasm installer does take advantage by installing plugins.

Unless you’re talking about performance on devices where those two weren’t a thing anyways.


As far as I understand, only the "driver" code is in rust. Everything else is just C++ compiled to WASM. Maybe it's slightly better to have the driver code be in rust than python or scheme or whatever, but I imagine C++ would be basically equivalent (and.... you wouldn't have to go through the trouble of compiling to WASM which likely loses significant performance).


That's what I find weird here. The bit of the code written in rust is almost comically tiny, and the rest is just C++ you compiled to WASM which someone else already wrote. I think comparing this to a Python wrapper for the same code would produce very minimal difference in performance, because the majority goes into performance and formatting the prompt string really isn't that complex of a task. I just don't see what advantage Rust produces here other than the fact that it's a language you can compile to WASM so that you have one binary.


ML has extremely predictable and heavily optimized routines. Languages that can target hardware ISA all tend to have comparable perf and there’s no reason to think Rust would offer much.


There is no mention of it running faster than the original llama2.cpp, if anything it is slower


No it's not. This does nothing to minimize the size of the models which inference on being run on. It's cool for edge applications, kind of. And Rust is already a go to tool for edge.


> this is a “holy shit” moment for Rust in AI applications

Yeah because I realized the 2MB is just a wrapper that reads stdin and offloads everything to wasi-nn API.

> The core Rust source code is very simple. It is only 40 lines of code. The Rust program manages the user input, tracks the conversation history, transforms the text into the llama2’s chat template, and runs the inference operations using the WASI NN API.

You can do the same using Python with fewer lines of code and maybe smaller executable size.


Pretty damning if 40 lines of rust to read stdin generates a 2 MB binary!


Presumably that also accounts for the WASM itself


Indeed. I hope it does include the WASM VM.


yeah excited to see how this will evolve. BTW, maybe give it a try on your Mac and see how it performs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: