Wow, this is a “holy shit” moment for Rust in AI applications if this works as d...

hnfong · on Nov 13, 2023

It's "just" a port of GGML (written in C++) to wasm with some additional Rust code.

bugglebeetle · on Nov 13, 2023

Right, but if the port achieves performance gains over GGML, which is already highly performant, that’s a. Wild b. a signal to move further GGML development into Rust, no?

Nevin1901 · on Nov 13, 2023

How would wasm/rust be more performant over c++? I’m not sure the wasm version can take advantage of avx/metal.

Edit: the wasm installer does take advantage by installing plugins.

Unless you’re talking about performance on devices where those two weren’t a thing anyways.

cozzyd · on Nov 13, 2023

As far as I understand, only the "driver" code is in rust. Everything else is just C++ compiled to WASM. Maybe it's slightly better to have the driver code be in rust than python or scheme or whatever, but I imagine C++ would be basically equivalent (and.... you wouldn't have to go through the trouble of compiling to WASM which likely loses significant performance).

kamray23 · on Nov 13, 2023

That's what I find weird here. The bit of the code written in rust is almost comically tiny, and the rest is just C++ you compiled to WASM which someone else already wrote. I think comparing this to a Python wrapper for the same code would produce very minimal difference in performance, because the majority goes into performance and formatting the prompt string really isn't that complex of a task. I just don't see what advantage Rust produces here other than the fact that it's a language you can compile to WASM so that you have one binary.

brrrrrm · on Nov 13, 2023

ML has extremely predictable and heavily optimized routines. Languages that can target hardware ISA all tend to have comparable perf and there’s no reason to think Rust would offer much.

tomalbrc · on Nov 13, 2023

There is no mention of it running faster than the original llama2.cpp, if anything it is slower

blovescoffee · on Nov 13, 2023

No it's not. This does nothing to minimize the size of the models which inference on being run on. It's cool for edge applications, kind of. And Rust is already a go to tool for edge.

est · on Nov 13, 2023

> this is a “holy shit” moment for Rust in AI applications

Yeah because I realized the 2MB is just a wrapper that reads stdin and offloads everything to wasi-nn API.

> The core Rust source code is very simple. It is only 40 lines of code. The Rust program manages the user input, tracks the conversation history, transforms the text into the llama2’s chat template, and runs the inference operations using the WASI NN API.

You can do the same using Python with fewer lines of code and maybe smaller executable size.

gumby · on Nov 13, 2023

Pretty damning if 40 lines of rust to read stdin generates a 2 MB binary!

lakpan · on Nov 13, 2023

Presumably that also accounts for the WASM itself

gpderetta · on Nov 13, 2023

Indeed. I hope it does include the WASM VM.

3Sophons · on Nov 13, 2023

yeah excited to see how this will evolve. BTW, maybe give it a try on your Mac and see how it performs.