Can you explain more about the "Coqui XTTS Lasinya" models that the code is usin...

wkat4242 · 2025-05-06T21:27:45 1746566865

Yeah I really dislike the whisperiness of this voice "Lasinya". It sounds too much like an erotic phone service. I wonder if there's any alternative voice? I don't see Lasinya even mentioned in the public coqui models: https://github.com/coqui-ai/STT-models/releases . But I don't see a list of other model names I could use either.

I tried to select kokoro in the python module but it says in the logs that only coqui is available. I do have to say the coqui models sound really good, it's just the type of voice that puts me off.

The default prompt is also way too "girlfriendy" but that was easily fixed. But for the voice, I simply don't know what the other options are for this engine.

PS: Forgive my criticism of the default voice but I'm really impressed with the responsiveness of this. It really responds so fast. Thanks for making this!

koljab · 2025-05-07T12:45:48 1746621948

Yeah I know the voice polarizes, I trained it for myself, so it's not an official release. You can change the voice here:

https://github.com/KoljaB/RealtimeVoiceChat/blob/main/code/a...

Create a subfolder in the app container: ./models/some_folder_name Copy the files from your desired voice into that folder: config.json, model.pth, vocab.json and speakers_xtts.pth (you can copy the speakers_xtts.pth from Lasinya, it's the same for every voice)

Then change the specific_model="Lasinya" line in audio_module.py into specific_model="some_folder_name".

If you change TTS_START_ENGINE to "kokoro" in server.py it's supposed to work, what does happen then? Can you post the log message?

wkat4242 · 2025-05-07T14:45:36 1746629136

Thank you!

I didn't realise that you custom-made that voice. Would you have some links to other out-of-the-box voices for coqui? I'm having some trouble finding them. I think from seeing the demo page that the idea is that you clone someone else's voice or something with that engine. Because I don't see any voices listed. I've never seen it before.

And yes I switched to Kokoro now, I thought it was the default already but then I saw there were 3 lines configuring the same thing. So that's working. Kokoro isn't quite as good though as coqui, that's why I'm wondering about that. I also used kokoro on openwebui and I wasn't very happy with it there either. It's fast, but some pronounciation is weird. Also, it would be amazing to have bilingual TTS (English/Spanish in my case). And it looks like Coqui might be able to do that.

koljab · 2025-05-07T18:13:37 1746641617

Didn't find many coqui finetunes too so far. I have David Attenborough and Snoop Dogg finetunes on my huggingface, quality is medium.

Coqui can to 17 languages. The problem with RealtimeVoiceChat repo is turn detection, the model I use to determine if a partial sentence indicates turn change is trained on english corpus only.

Buckaroo9 · 2025-05-06T04:26:00 1746505560

https://huggingface.co/coqui/XTTS-v2

optimog · 2025-05-06T06:41:41 1746513701

Seems like they are out of business. Their homepage mentions "Coqui is schutting down"* That is probably the reason you can't find that much.

*https://coqui.ai/

koljab · 2025-05-07T12:51:46 1746622306

Lasinya voice is a XTTS 2.0.2 finetune I made with a self-created, synthesized dataset. I used https://github.com/daswer123/xtts-finetune-webui for training.