There are two models at use here, whisper tiny for transcribing audio, and then llama 3 for responding.
Whisper tiny is multi lingual (though I am using the english specific variant) and I believe llama 3 is technically capable of multi-lingual, but not sure of any benchmarks.
I think it could be made better, but for now focus is english. I'll add this to the readme though. Thanks!