What are you using for transcription? I tried Whisper, but it's slow and not gre...

tptacek · 2025-11-07T03:25:30 1762485930

I recently bought a mint-condition Alf phone, in the shape of Gordon Shumway of TV's "Alf", out of the back of an old auto shop in the south suburbs of Chicago, and naturally did the most obvious thing, which was to make a Gordon Shumway phone that has conversations in the voice of Gordon Shumway (sampled from Youtube and synthesized with ElevenLabs). I use https://github.com/etalab-ia/faster-whisper-server (I think?) as the Whisper backend. It's fine! Asterix feeds me WAV files, an ASI program feeds them to Whisper (running locally as a server) and does audio synthesis with the ElevenLabs API. Took like 2 hours.

t_akosuke · 2025-11-07T10:34:43 1762511683

Been meaning to build something very similar! What hardware did you use? I'm assuming that a Pi or similar won't cut it

tptacek · 2025-11-07T13:24:46 1762521886

Just a cheap VOIP gateway and a NUC I use for a bunch of other stuff too.

ericd · 2025-11-07T04:29:36 1762489776

Whisper.cpp/Faster-whisper are a good bit faster than OpenAI's implementation. I've found the larger whisper models to be surprisingly good in terms of transcription quality, even with our young children, but I'm sure it varies depending on the speaker, no idea how well it handles heavy accents.

I'm mostly running this on an M4 Max, so pretty good, but not an exotic GPU or anything. But with that setup, multiple sentences usually transcribe quickly enough that it doesn't really feel like much of a delay.

If you want something polished for system-wide use rather than rolling your own, I've been liking MacWhisper on the Mac side, currently hunting for something on Arch.

richardlblair · 2025-11-07T17:25:23 1762536323

The new Qwen model is supposed to be very good.

Honestly, I've gotten really far simply by transcribing audio with whisper, having a cheap model clean up the output to make it make sense (especially in a coding context), and copying the result to the clipboard. My goal is less about speed and more about not touching the keyboard, though.

andai · 2025-11-07T19:03:14 1762542194

Thanks. Could you share more? I'm about to reinvent this wheel right now. (Add a bunch of manual find-replace strings to my setup...)

Here's my current setup:

vt.py (mine) - voice type - uses pyqt to make a status icon and use global hotkeys for start/stop/cancel recording. Formerly used 3rd party APIs, now uses parakeet_py (patent pending).

parakeet_py (mine): A Python binding for transcribe-rs, which is what Handy (see below) uses internally (just a wrapper for Parakeet V3). Claude Code made this one.

(Previously I was using voxtral-small-latest (Mistral API), which is very good except that sometimes it will output its own answer to my question instead of transcribing it.)

In other words, I'm running Parakeet V3 on my CPU, on a ten year old laptop, and it works great. I just have it set up in a slightly convoluted way...

I didn't expect the "generate me some rust bindings" thing to work, or I would have probably gone with a simpler option! (Unexpected downside of Claude is really smart: you end up with a Rube Goldberg machine to maintain!)

For the record, Handy - https://github.com/cjpais/Handy/issues - does 80% of what I want. Gives a nice UI for Parakeet. But I didn't like the hotkey design, didn't like the lack of flexibility for autocorrect etc... already had the muscle memory from my vt.py ;)

richardlblair · 2025-11-10T15:07:16 1762787236

My use case is pretty specific - I have a 6 week old baby. So, I've been walking on my walking pad with her in the carrier. Typing in that situation is really not pleasant for anyone, especially the baby. Speed isn't my concern, I just want to keep my momentum in these moments.

My setup is as follow: - Simple hotkey to kick off shell script to record

- Simple python script that uses ionotify to watch directory where audio is saved. Uses whisper. This same script runs the transcription through Haiku 4.5 to clean it up. I tell it not to modify the contents, but it's haiku, so sometimes it just does it anyway. The original transcript and the ai cleaner versions are dumped into a directory

- The cleaned up version is run through another script to decide if it's code, a project brief, an email. I usually start the recording "this is code", "this is a project brief" to make it easy. Then, depending on what it is the original, the transcribed, and the context get run through different prompts with different output formats.

It's not fancy, but it works really well. I could probably vibe code this into a more robust workflow system all using ionotify and do some more advanced things. Integrating more sophisticated tool calling could be really neat.

nostrebored · 2025-11-07T04:17:51 1762489071

Parakeet is sota

dSebastien · 2025-11-07T04:26:13 1762489573

Agreed. I just launched https://voice-ai.knowii.net and am really a fan of Parakeet now. What it manages to achieve locally without hogging too much resources is awesome

raymond_goo · 2025-11-07T09:29:31 1762507771

https://github.com/rhulha/Speech2Speech

https://github.com/rhulha/EchoMate

segu · 2025-11-07T10:25:08 1762511108

Handy is free, open-source and local model only. Supports Parakeet: https://github.com/cjpais/Handy

ty00001 · 2025-11-08T19:09:04 1762628944

Speechmatics - it is on the expensive side, but provides access to a bunch of languages and the accuracy is phenomenal on all of them - even with multi-speakers.

greenfish6 · 2025-11-07T08:06:12 1762502772

I use Willow AI, which I think is pretty good