ZDisket's comments

ZDisket · 2026-03-19T22:14:22 1773958462

Yes. Specifically, the pipeline is text -> phonemizer -> phonemized text -> TTS model -> audio You just have to modify the phonemizer's dictionary.

ZDisket · 2026-03-19T22:13:05 1773958385

No multilingual capabilities yet, although that is planned for next iteration.

ZDisket · 2026-03-09T04:47:41 1773031661

I'm working on a voice cloning version of my TTS model, a highly upgraded VITS:

https://x.com/ZDi____/status/2013655958027669958

Right now, I only have single speaker checkpoints (as per the old video). That will change soon.

warangal · 2026-03-09T09:29:43 1773048583

VITS is such a cool model (and paper), fast, minimal, trainable. Meta took it to extreme for about 1000 languges.

It seems like you have been working on this application for sometime, i will go through your code , but could you provide some context about upgradations/changes you have made, or some post describing your efforts.

Cool nonetheless!

ZDisket · 2026-03-09T16:45:21 1773074721

I'll explain in detail once I've got the big release, but everything's been thoroughly modernized. Transformer, HiFi-GAN (now iSTFTNet w/Snake) vocoder, et al, plus a few additions.

dv35z · 2026-03-09T05:45:05 1773035105

Recommendations for local text-to-speech synth? Last year, played with Piper-TTS, Chatterbox, and some others. Ideally supporting English, Spanish, Chinese.

ZDisket · 2026-03-09T05:55:09 1773035709

Multilingual and local? Try out Supertonic 2.

ZDisket · 2026-03-08T01:31:45 1772933505

Upwork has candidates buy "connects" with real money that are spent when applying to jobs. Ultimately it seems some form of payment is a proven gate.