The audio/TTS space just moved fast. In the last week alone:
NVIDIA – PersonaPlex-7B
Open-source, full-duplex conversational speech model.
Inworld AI – TTS-1.5
Realtime TTS (<250ms), $0.005/min, currently #1 on Artificial Analysis.
Flash Labs – Chroma 1.0
First open-source, end-to-end, real-time speech-to-speech model.
Alibaba Qwen – Qwen3-TTS
Fully open-sourced TTS family: Base, CustomVoice, VoiceDesign.
Kyutai Labs – Pocket TTS
Runs locally on a laptop. No GPU required.
Feels like TTS is hitting the same acceleration moment LLMs had last year.
Realtime, open-source, and local is becoming the default.
Curious what people here are building with this