Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Actually, speech-to-text benefits massively from a good language model. It's impossible to do speech to text if you don't understand the language. The better you understand the language and the context of what is being said, the better you will be at speech-to-text. So it's no surprise whatsoever to anyone that the best-in-class language model would have the best in class speech-to-text.

I think a lot of people underestimate how disconnected simple sound patterns are from human speech. It's hard if not impossible to even recognize word boundaries on a phonogram of regular human speech, even for highly eloquent speakers in formal settings. And many sounds are entirely ambiguous, people rarely understand the exact phonemes they use in practice. For example, most native English speakers pronounce the "peech" part of "speech" more like "beach" than like "peach", if you look at a phonogram [0]. Phonetics is really complicated, and varies far more between languages than people tend to assume.

[0] https://www.youtube.com/watch?v=U37hX8NPgjQ



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: