Hacker Newsnew | past | comments | ask | show | jobs | submit | more karimf's commentslogin

Thank you. I'm looking for this as well. The realtime model is a closed-source model and it's different than the open Qwen3-Omni-30B-A3B, right?

I wonder how hard is it to turn the open-source model to be a realtime model.


Why do you say they are different models? I've been looking at this today and haven't seen anything explicitly state that.


This is just my assumption given that they listed a lot of different models here: https://modelstudio.console.alibabacloud.com/?spm=a3c0i.2876...

This is an older link, but they listed two different sections here, commercial and open source models: https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...

For the realtime multimodal, I'm not seeing the open source models tab: https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...


Is the Qwen3-Omni-Flash the same as Qwen3-Omni-30B-A3B, or is the Omni-Flash a different closed-source model?


In Section 5 of their [technical report](https://arxiv.org/pdf/2509.17765v1) they mention them

"... A comprehensive evaluation was performed on a suite of models, including Qwen3-Omni-30B-A3B- Instruct, Qwen3-Omni-30B-A3B-Thinking, and two in-house developed variants, designated Qwen3- Omni-Flash-Instruct and Qwen3-Omni-Flash-Thinking. These “Flash” models were designed to improve both computational efficiency and performance efficacy, integrating new functionalities, notably the support for various dialects. ..."


My question too



Seems to be a perfect starting point-- passed on -- thanks!


Do you have any information on how long each step take? Like how many ms on each step of the pipeline?

I'm curious how fast it will run if we can get this running on a Mac. Any ballpark guess?


LLM and TTS latency get's determined and logged at the start. It's around 220ms for the LLM returning the first synthesizable sentence fragment (depending on the length of the fragment, which is usually something between 3 and 10 words). Then around 80ms of TTS until the first audio chunk is delivered. STT with base.en you can neglect, it's under 5 ms, VAD same. Turn detection model also adds around 20 ms. I have zero clue if and how fast this runs on a Mac.


This is super awesome. Several questions.

1. What GPU did you use to train the model? I'd love to train a model like this, but currently, I only have a 16GB MacBook. Thinking about buying a 5090 if it's worth.

2. Is it possible to use this for real time audio generation, similar to the demo on the Sesame website?


This might be a game changer for learning English.

I'm from a developing country and it's sad that most English teachers on public schools here can't speak English well. There are good English teachers, but they are expensive and they are not affordable for the average people.

OpenAI realtime models are good, but we can't deploy it to masses since it's very expensive.

This model might be able to solve the issue since it's better or on par with the OpenAI model, yet it's significantly cheaper since it's a fairly small model.



ThePrimeagen.

Sometimes I lost my spark with programming. Watching him reminds me to enjoy programming more.


I don't have time to watch him regularly, but I really enjoy that he can both walk the walk as well as pontificate on a variety of programming topics.

There are a lot of coding streamers/influencers who don't have the breadth and depth he does, nor the high level experience at a top tech company. They either did a year at Meta and became full time streamers, or are purely competitive programmers, but ThePrimeagen hits the sweet spot for me.

I think he's a great example for lots of engineers everywhere, especially those who are just getting into the idustry. He didn't have an easy path either of just getting a FAANG internship and job right out of college, he had to work his way up to where he is now.


I used to like him but started to feel he keeps ranting about very similar topics a lot:

1. neovim >>> vscode

2. X > JS (X=Rust, Go, OCaml etc)

I feel he doesn't do justice to his seniority and breadth of knowledge.

He, for sure, knows a ton. Yet chooses to waste streams on noob topics like editor wars, programming language wars etc.

I much prefer Theo-T3 because he finds interesting topics and really explores tech in depth.


> Yet he chooses to get hella views with noob topics like editor wars, programming language wars etc.

I fixed that. Those topics almost certainly perform better because they're automatically controversial.


theo t3 excels very well on this one, really finds the balance on talking about opposing views of different people without being overly opinionated about it.


I find the total opposite, his video feed just reads like the front page of HN most low effort topic.

Im not hating at all but for example https://youtu.be/ZOYp6-k9HhE?feature=shared

Theres nothing insightful there that wasnt just in the comments of hn a few days ago.


There's some strong views about ThePrimeagen in this thread. Just want to share my experience. I've spent tonnes of time configuring Vim, Neovim etc. Its ... a ... mess.

I love both editors and for the love of me, I don't know why configuring them is soooo hard and brittle.

ThePrimeagen videos on Vim/Neovim is by far information dense videos. It took me sometime to ignore his style of presentation and just focus on content. However, the value I got out of watching his videos is undeniable. Knowing his background a bit and how he battled addiction gave me some context. (Sorry can't find that video on his channel now)

Continuing on this topic of Vim/Neovim ...

Leeren Chen (https://www.youtube.com/@leeren_) is pure genius on the topic of configuring Vim. I've never seen another person like him on Youtube, who uses Vimscript to configure Vim to make it work like an IDE (almost).

[1] https://youtu.be/JFr28K65-5E [2] https://youtu.be/Gs1VDYnS-Ac

TJ Devries (https://www.youtube.com/@teej_dv) videos on Neovim are awesome too (He's core dev of Neovim). But there's lot of gimmicks in his video and it can put people off. His videos with @BashBunni is very approachable in terms of learning about and configuring Neovim.


He's a Troll and authoritarian. At first he was kind-of entertaining, but he's full of himself and thinks he knows it all.

And his stupid, nonsensical editor wars.

The more I watched him to more u disliked him and eventually unfollowed.

And his noob topics are boring as well. He's pretty much the average HN user, only more experienced. He believes the hype. Like htmx is hyped so he does htmx.

His streams provide nothing for me in terms of new information that us useful.


I like to watch his blog post reads on YouTube.


+1, he is entertaining :)


I've been following the scene of Mac gaming for a while. Isaac, the author of this project, is someone who contributes a lot to this space. He created Whisky and contributed to Ryujinx, a Switch emulator that works on Mac, and Playcover, a way to play iOS apps and games on Mac. Also, he's still 17 years old. [0]

[0] https://isaacmarovitz.com/


Hi there, it's been wonderful reading everyone's comments!


This is a tiny detail but boy do I appreciate that he actually bothered to use the real toolkit (AppKit, SwiftUI, whatever) for the GUI… I am so, so tired of projects like Docker Desktop that cram basic UI's into Electron and think that's somehow "fine."

Thank you, Isaac. The details matter.


If you target latest macOS that's an easy choice.


Using AppKit to support older versions of macOS is not particularly difficult.


Electron is probably cheaper to inplement and works - sadly chrome under the hood outgrows it in resource hungriness


AppKit (or swiftui I guess) is fairly simple. IB makes having a correctly behaving UI super easy - far more so than jumping through CSS hoops, and your app will behave correctly on macOS, which electron apps do not.

Electron apps are only easier if you don’t care about good UI, just “cheap cross platform”. It’s even more obnoxious when many of these “apps” can be loaded in safari and then interact properly with the rest of the system.

Electron apps are lowest common denominator apps where the primary goal is cheap rather than good. There is no user facing metric by which an electron app is superior to a native one: they’re slower, bigger, and use vastly more memory, and result in drastically worse battery life.


You get what you pay for, I guess.



Thanks


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: