Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would you say you are using the best-in-class speech to text libs at the moment? I feel like this space is moving fast because the last time I was headed down this track, I was sure whisper-cpp was the best.


I'm not sure tbh. Whisper was king for so long time now, especially with the ctranslate2 implementation from faster_whisper. Now nvidia open sourced Parakeet TDT today and it instantly went no 1 on open asr leaderboard. Will have to evaluate these latest models, they look strong.



Tried that one. Quality is great but sometimes generations fail and it's rather slow. Also needs ~13 GB of VRAM, it's not my first choice for voice agents tbh.


alright, dumb question.

(1) I assume these things can do multiple languages

(2) Given (1), can you strip all the languages you aren't using and speed things up?


Actually good question.

I'd say probably not. You can't easily "unlearn" things from the model weights (and even if this alone doesn't help). You could retrain/finetune the model heavily on a single language but again that alone does not speed up inference.

To gain speed you'd have to bring the parameter count down and train the model from scratch with a single language only. That might work but it's also quite probable that it introduces other issues in the synthesis. In a perfect world the model would only use all that "free parameters" not used now for other languages for a better synthesis of that single trained language. Might be true to a certain degree, but it's not exactly how ai parameter scaling works.


I don't know what I'm talking about, but could you use distillation techniques?


Maybe possible, I did not look into that much for Coqui XTTS. What i know is that the quantized versions for Orpheus sound noticably worse. I feel audio models are quite sensitive to quantization.


Paraket is english only. Stick with Whisper.

The core innovation is happening in TTS at the moment.


Yeah, I figured you would know. Thanks for that, bookmarking that asr leaderboard.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: