"... A comprehensive evaluation was performed on a suite of models, including Qwen3-Omni-30B-A3B-
Instruct, Qwen3-Omni-30B-A3B-Thinking, and two in-house developed variants, designated Qwen3-
Omni-Flash-Instruct and Qwen3-Omni-Flash-Thinking. These “Flash” models were designed to improve
both computational efficiency and performance efficacy, integrating new functionalities, notably the
support for various dialects. ..."
LLM and TTS latency get's determined and logged at the start. It's around 220ms for the LLM returning the first synthesizable sentence fragment (depending on the length of the fragment, which is usually something between 3 and 10 words). Then around 80ms of TTS until the first audio chunk is delivered. STT with base.en you can neglect, it's under 5 ms, VAD same. Turn detection model also adds around 20 ms.
I have zero clue if and how fast this runs on a Mac.
1. What GPU did you use to train the model? I'd love to train a model like this, but currently, I only have a 16GB MacBook. Thinking about buying a 5090 if it's worth.
2. Is it possible to use this for real time audio generation, similar to the demo on the Sesame website?
This might be a game changer for learning English.
I'm from a developing country and it's sad that most English teachers on public schools here can't speak English well. There are good English teachers, but they are expensive and they are not affordable for the average people.
OpenAI realtime models are good, but we can't deploy it to masses since it's very expensive.
This model might be able to solve the issue since it's better or on par with the OpenAI model, yet it's significantly cheaper since it's a fairly small model.
I don't have time to watch him regularly, but I really enjoy that he can both walk the walk as well as pontificate on a variety of programming topics.
There are a lot of coding streamers/influencers who don't have the breadth and depth he does, nor the high level experience at a top tech company. They either did a year at Meta and became full time streamers, or are purely competitive programmers, but ThePrimeagen hits the sweet spot for me.
I think he's a great example for lots of engineers everywhere, especially those who are just getting into the idustry. He didn't have an easy path either of just getting a FAANG internship and job right out of college, he had to work his way up to where he is now.
theo t3 excels very well on this one, really finds the balance on talking about opposing views of different people without being overly opinionated about it.
There's some strong views about ThePrimeagen in this thread. Just want to share my experience. I've spent tonnes of time configuring Vim, Neovim etc. Its ... a ... mess.
I love both editors and for the love of me, I don't know why configuring them is soooo hard and brittle.
ThePrimeagen videos on Vim/Neovim is by far information dense videos. It took me sometime to ignore his style of presentation and just focus on content. However, the value I got out of watching his videos is undeniable. Knowing his background a bit and how he battled addiction gave me some context. (Sorry can't find that video on his channel now)
Continuing on this topic of Vim/Neovim ...
Leeren Chen (https://www.youtube.com/@leeren_) is pure genius on the topic of configuring Vim. I've never seen another person like him on Youtube, who uses Vimscript to configure Vim to make it work like an IDE (almost).
TJ Devries (https://www.youtube.com/@teej_dv) videos on Neovim are awesome too (He's core dev of Neovim). But there's lot of gimmicks in his video and it can put people off. His videos with @BashBunni is very approachable in terms of learning about and configuring Neovim.
He's a Troll and authoritarian.
At first he was kind-of entertaining, but he's full of himself and thinks he knows it all.
And his stupid, nonsensical editor wars.
The more I watched him to more u disliked him and eventually unfollowed.
And his noob topics are boring as well.
He's pretty much the average HN user, only more experienced.
He believes the hype.
Like htmx is hyped so he does htmx.
His streams provide nothing for me in terms of new information that us useful.
I've been following the scene of Mac gaming for a while. Isaac, the author of this project, is someone who contributes a lot to this space. He created Whisky and contributed to Ryujinx, a Switch emulator that works on Mac, and Playcover, a way to play iOS apps and games on Mac. Also, he's still 17 years old. [0]
This is a tiny detail but boy do I appreciate that he actually bothered to use the real toolkit (AppKit, SwiftUI, whatever) for the GUI… I am so, so tired of projects like Docker Desktop that cram basic UI's into Electron and think that's somehow "fine."
AppKit (or swiftui I guess) is fairly simple. IB makes having a correctly behaving UI super easy - far more so than jumping through CSS hoops, and your app will behave correctly on macOS, which electron apps do not.
Electron apps are only easier if you don’t care about good UI, just “cheap cross platform”. It’s even more obnoxious when many of these “apps” can be loaded in safari and then interact properly with the rest of the system.
Electron apps are lowest common denominator apps where the primary goal is cheap rather than good. There is no user facing metric by which an electron app is superior to a native one: they’re slower, bigger, and use vastly more memory, and result in drastically worse battery life.
I wonder how hard is it to turn the open-source model to be a realtime model.