I appreciate the work that went into slimming this binary down, but it's a ~negligble amount of work compared to llama.cpp itself.
HN is inundated with posts doing xyz on top of the x.cpp community. Whilst I appreciate it is exciting - I wish more people would explore the low-level themselves! We can be much more creative in this new playground.
I’ve been developing a Rust + WebGPU ML framework for the past 6 months. I’ve learned quickly how impressive the work by GG is.
It’s early stages but you can check it out here: https://www.ratchet.sh/ https://github.com/FL33TW00D/whisper-turbo