Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is the art of subtitling, and then there is the technical reality that sometimes you have some content with no subtitles and just want a solution now, but the content didn't come with an SRT or better yet VTT and OpenSubtitles has no match.

They're using Whisper for speech to text, and some other small model for basic translation where necessary. It will not do speaker identification (diarization), and certainly isn't going to probe into narrative plot points to figure out if naming a character is a reveal. It isn't going to place text on the screen according to the speaker's frame place, nor for least intrusion. It's just going to have a fixed area where a best effort at speech to text is performed, as a last resort where the alternative is nothing.

Obviously it would be preferred to have carefully crafted subtitles from the content creator, translating if the desired language isn't available but still using all the cues and positions. Secondly to have some carefully crafted community subtitles from opensubtitles or the like, maybe where someone used "AI" and then hand positioned/corrected/updated. Failing all that, you fall to this.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: