Hacker Newsnew | past | comments | ask | show | jobs | submit | ZDisket's commentslogin

Yes. Specifically, the pipeline is text -> phonemizer -> phonemized text -> TTS model -> audio You just have to modify the phonemizer's dictionary.


No multilingual capabilities yet, although that is planned for next iteration.


I'm working on a voice cloning version of my TTS model, a highly upgraded VITS:

https://x.com/ZDi____/status/2013655958027669958

Right now, I only have single speaker checkpoints (as per the old video). That will change soon.


VITS is such a cool model (and paper), fast, minimal, trainable. Meta took it to extreme for about 1000 languges.

It seems like you have been working on this application for sometime, i will go through your code , but could you provide some context about upgradations/changes you have made, or some post describing your efforts.

Cool nonetheless!


I'll explain in detail once I've got the big release, but everything's been thoroughly modernized. Transformer, HiFi-GAN (now iSTFTNet w/Snake) vocoder, et al, plus a few additions.


Recommendations for local text-to-speech synth? Last year, played with Piper-TTS, Chatterbox, and some others. Ideally supporting English, Spanish, Chinese.


Multilingual and local? Try out Supertonic 2.


Upwork has candidates buy "connects" with real money that are spent when applying to jobs. Ultimately it seems some form of payment is a proven gate.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: