I didn't see a mention of languages in the readme. Does this understand language...

nkaz123 · on May 14, 2024

There are two models at use here, whisper tiny for transcribing audio, and then llama 3 for responding.

Whisper tiny is multi lingual (though I am using the english specific variant) and I believe llama 3 is technically capable of multi-lingual, but not sure of any benchmarks.

I think it could be made better, but for now focus is english. I'll add this to the readme though. Thanks!

timendum · on May 14, 2024

The suggested model for vision capabilities is english only.