Fair point. We are working on releasing this in the App Store where there won’t be a need to enter credit card details - billing will go through Apple and cancelling subscriptions is easier. Happy to provide a promo code for a one month free trial without a card if you email us at [email protected].
Thank you, the Android version should be out imminently but Google takes 2 weeks to review and then rejects it because I implied “the news” had endorsed it. Fingers crossed early in the week!
I fell down the rabbit hole of voice transcription about a year ago, always had a love for utilising fine tuned LLMs so have put two and two together and built https://whistle-enterprise.com. The biggest challenge being it all running on CPU with the target device being your low to mid spec office laptop that's a few years old (I5, 8gb RAM). All nicely packaged together in a single completely offline selfcontained app that you just install and run (no environment setups, packages to download, models to download etc).
One of the hardest parts I've found is the diarisation (who said what) side of things. Trying to tune this and have it working in a way that doesn't absolutely grind the laptop to a halt or take forever to complete has been _hard_ but also extremely rewarding.
Another part has been the fine tuning side of the Phi-4 model, I'm on version 10 now, getting that pipeline down was a journey in itself, but I've got some great results. I wrote a bit about it in a comment here - https://news.ycombinator.com/item?id=48385906#48389625
I absolutely love working on this, I still wake up and the first thing I think about is voice transcription pipelines (sad I know), but I'm excited to see how much further performance and utility I can squeeze out.
Are we the same person?? Haha, this is super close to the scope of work I've been doing and just released. Different objectives though. It sounds like yours prioritizes legacy hardware and is more enterprise focused (good for you!). Mine is focused more on long-term project tracking and program management for solo developers or solo builders.
I also got hammered when it came to diarization... I found that the biggest pain was creating an appropriate environment for cross-compatibility of the different backends required for whisper/faster-whisper/pyannote. It's especially challenging on older systems, so major kudos for giving it a shot.
Have you gotten any traction yet from the community?
> Mine is focused more on long-term project tracking and program management for solo developers or solo builders.
This looks very useful, will download and give it a shot later. It took me a few seconds to find it on your page, and only got to the screenshots in the "navigate" sections after clicking through a lot. I would suggest putting a screenshot or something on the landing page so people can see and understand what it is.
Thank you, it's nice to hear someone else has gone through similar pain (in a good way)!
It's been slow and steady, but it's hard. I've commented previously that whilst the cost to build software has plummeted compared to 2 or 3 years ago, the ability to sell it has got harder and I feel this will keep accelerating.
This was built just for them so I've not spent too much time on the UI (ignore `unstable` in the name, it's just not on a proper release branch) but it's completely free so give it a go if you want. I'm working on the diarisation step so it can tag subtitles to people but that's not ready yet.
It utilises nvidia Parakeet as the ASR model, it is very much European language focused, the supported ones are:
If these languages aren't what you're looking for let me know what you need and I'll see what I can do.
I use subtitles extensively for everything I watch, so if I can help someone make older movies more accessible with them then that would make me happy.
I hate to do the "you're holding it wrong" trope, but I think you might have something misconfigured somewhere unless you missed a 0, because just past 60k tokens is such a small context window to be seeing issue in.
Do you have any old documentation that it's picking up and referencing? If you set all claude settings back to default do you see the same issue?
n=1 but, a friend of mine spent the last few months working on an experimental music software with Claude. What he built is amazing and far beyond my abilities (I have been programming for 20 years). He doesn't know any programming.
In fact, it's far beyond what I would even attempt, because I've just spent two decades building up a data bank of how hard things are supposed to be.
He doesn't know it's supposed to be hard, so he just does it.
Is his code maintainable, though? Or is it just a pile of code which happens to work? What if he wants to change something? Does he generate again the whole thing from scratch? Or does he tell Claude to make the changes and doesn't even know when something breaks when a new thing is added? (Assuming the software is complex, having multiple non trivial features.)
Claude Code does not regenerate an entire project when you ask it to make one change. It just makes the change.
He's been working on it for several hours per day for several months.
He has occasionally complained to me about the stupidity of AI. Nevertheless, his achievement is remarkable. He simply persisted despite the stupidity.
It does occasionally break things when adding new features. I think it does it less often than I do, though.
(My "random error" rate is quite high, and scales with the complexity of the code base. Fortunately, the Transformer has a slightly higher working memory than I do.)
I will grant though, that he shipped it with zero thought for performance. "Damn, it works so well on my machine though", he said, having the best machine in the world! I'm not sure that's the LLM's fault though. I ran into disregard for performance often, before LLMs!
There’s no free lunch, it takes time and effort still. And expertise if you need it to be robust.
In terms of velocity, let me offer some numbers. In 6 months I generated >150k lines of code and merged 10k PRs to ship and iterate on https://plotalong.app
I follow best practices and isolate agents to continuously deployed dev environments, semi-manually review PRs and gate the release process between multiple protected envs. The project is getting close to 500 end-to-end tests in Playwright.
That’s just working nights and weekends. Before AI, it took my team at the office 4 years to produce this much work. There are some qualitative differences but the speed and results are real
Thank you for the assumption, I'm actually not a developer at all.
I'm from a hardware / networking / infrastructure background. I've had extensive exposure to (web) application development as I'm working closely with development teams and I do have the bash/powershell scripting knowledge.
But honestly, if I tried this "the old fashioned way" it probably would have taken me about 6 to 7 years to develop that application, that's an optimistic estimate. You really do have to have a passion for what you're building, I didn't know that voice transcription and local LLMs would be such a driving force for me, but it's all I think about, so much that I find it hard to go to sleep sometimes.
This one works well. I think it's because there's no shine to it, it's just the data, what you need, right there without trying to fluff it all out with rounded edges and superfluous stuff.
I find it such a hard thing to quantify, I know it's not helpful but you can just feel the slop seep through.
I'm not sure if it's because I've iterated through so many sites that LLMs have produced that "slop" is instantly recognisable and it just feels soulless.
Not like web pages ever had a soul, but it's not there on the generic LLM generated sites.
I think it’s the fact that my eyes have been blasted with a certain visual ‘vibe’, and I’ve come to associate it with apps that are, on average, a bit lazy
This absolutely fascinates me. I had a friend who needed subtitle files generating for audio and using in CapCut yesterday yet none of the available stuff was suitable, so he asked if I could adapt some of my software to export subtitles.
2 hours later he's got a fully working piece of local software that does exactly what he wants, yet yours is not able to even sort dates correctly. Feel free to download it if you want to see for yourself, I didn't even do any UI tweaks as this was just a tool for him to use:
> How can there be such a massive gap in what can be produced?
What I was doing looks really nice and mostly works on the surface, but it is all about the corner cases where these bugs appear. In another day I was able to generate Frida script with LLM help that bypasses Dart certificate pinning/validation and proxies all the traffic by injecting the runtime binaries. With the latest Flutter/Dart version on Android when doing security analysis.
Ahhh ok I totally understand what you mean. Yea the edge cases are absolutely where you start to feel the pain and things look good on the surface until you dig in. I think even in the age of LLMs the adage of 90% of the time is spent of the last 10% will ring true.
Sure an app can be built and spun up in an afternoon, but are you willing to spend another 6 months ironing out all those little bugs, tuning it a bit, testing, tweaking, testing etc.
This seems so wide reaching if it's catching simple things like explaining a paper. Does this also refuse to help with any already developed training pipelines?
I can kind of understand the generation of synthetic data, but nerfing the assistance of training pipelines just seems like a really shitty thing to do.
Maybe allow a new user to view previous data for their specific investment.
reply