I watched the video...is this anything more than a mashup of groupchat and speech-to-text? Couldn't something similar be achieved with Google's speech to text API and IRC? I would've been impressed if the transcription was amazing, but there are errors in the video ("foreclosure" instead of "surfer culture", for one).
A very dismissive and short-sighted comment. The ambition here seems to go beyond the current implementation, to have a "magic" chat view that tracks voices and transcribes them in differing colors automatically and with minimal setup. I laud the effort and encourage the team to play this out for the sake of the hearing impaired. I hope that in the next decade deaf people--and their interlocutors--won't have to hobble together a slew of disparate technologies just to enable a group conversation.
It's not dismissive, it's inquisitive, and it lays out their reasoning why they don't understand what's special about this. In asking what's special, it gives the opportunity for proponents to address those questions specifically as to why they think it's special and different, so other readers that may have shared the original opinion get more information.
Your explanation about why you think it's special us useful a good example of a positive outcome of the original comment. The way you initially denigrate the question is not.
It's undeniably dismissive. Yes, the first sentence is a question, but "is it anything more than X?" is a rhetorical flourish meant to imply that the product is trivially replicated. The next sentence goes on to say that he's not impressed.
As pg put it:
Maybe you think you're making some sort of important point here. Or maybe you realize your comment is inane and you think it's witty. But (perhaps without realizing it) you and the people upvoting you represent one of the worst forces at work in the world. The people who ridicule new things when they first appear in incomplete form are one of the worst drags on innovation. [1]
I don't think it is, and I think the fact the person begins with a question asking if their assessment of the technology is correct is integral to that point.
As a real life example, I was leaning towards the original poster's interpretation of the product. The reply it spurred helped me see the product in a different light, and I think it has more merit than I originally did, even if I'm not sure the technologies in use, or even how they are combined, is especially new and noteworthy. As is all to often the case, it's the implementation that matters.
In think my initial opinion was dismissive, the original comment was inquisitive (if a bit critical, but I see nothing wrong with some light criticism), the reply it spurred was illuminating, and my resulting opinion was hopeful. I view that as part of HN's success, not something that needs to be overly policed.
GP here; you've used much better words to mirror my opinion. This being on Hacker News, my initial thought was that it would be an amazing technical display. Frankly, it isn't, but the discussion here has helped me realize that the reason we care about it is because it's an incredibly useful application of existing tech. That's still great—there's a ton of value outside of technical wizardry—but it just wasn't immediately clear to me after reading the article.
hi tsm, founder of Transcense here. beyond the impact we want to do, we always at some point built on top of others/existing technologies. Innovation definition is tricky. Is it in the technical implementation (an Instagram is not that complicated after all) or in the productization/distribution to market?
We're humbled to have been posted on HN, not from us. But stay updated, what's coming next will be even more interesting.
Looks to me like at least one personal already countered that claim, thus it is no longer undeniably dismissive. I would say your response to his question is even more dismissive of any "dismissal" that may had been interpreted from the OP.
> The ambition here seems to go beyond the current implementation, to have a "magic" chat view that tracks voices and transcribes them in differing colors automatically and with minimal setup.
To put it another way: real-time subtitles. Imagine having something like this in Google Glass. As a person with profound hearing loss, that blows me away.
This technology could easily be repurposed for subtitling videos.
I don't think a small team of people can be better than Google or Apple in building a speech to text technology. However, leveraging the tools available to help the deaf is the main idea - at least at the beginning - I think. Going forward, they will probably focus on their "Leap Motion" part of the project: from signs to text/voice and let big companies improve their text to speech algorithms that they would just use. Because that's where big improvement can be made.
This technology (speaker identification) is 10 years old, and [HMM/neural net] speech recognition is slightly older. So a small team could likely pull it off today just by implementing or using code published by researchers. As long as Transcense have control over the microphone(s), then it might work. Single mic/multispeaker speech recognition is still practically impossible unless the speakers take turns (not always the case).
You are completely correct -- this is, indeed, nothing more than a simple mash up of existing commodotized capabilities. Yet there appears to be a market opportunity to sell such a service for as much as $100-300/year. You should build a competing product (a simple mash up) and sell it.
Seems to leave the hearing impaired party as an outsider that is observing the conversation. Great step forward to giving the hearing impaired a foot in the door, so to speak. I'm curious to see where tech like this continues to develop to create an equal playing field for the hearing impaired within the conversation.
My Grandma is slowly losing her hearing - one ear is 100% deaf while her other remaining one is at about 50%. She uses a CapTel speech to text phone with a huge display to understand what is said when people call her. It generally works well, but she only has one in her family room. She struggles to hear especially when there are multiple people speaking and there is background noise. I've learned techniques to improve her comprehension, but it can only go so far (If interested, here's a few: make sure you're looking at them when talking, speak in a 'deeper tone', don't rush your words, continually repeat what was said until they understand, etc).
Almost everyone in the United States has a phone. If I could download an app that runs this program along with my cousins, and have my Grandma use her 'iPad' (Nook tablet) to understand, with the assistance of something like Transcense, that would be amazing. By linking several microphones, they may be able to cancel out background noise and only highlight the specific speaker, and that would be a fantastic advance.
I'm wondering what their current state of Transcense's speech recognition is, however. From the video, it did see like there were some errors. I'm sure a deaf user can understand what was meant to be said using context of the conversation, but in a business meeting a misunderstood word can change the whole meaning of the sentence or message. I've used Siri, Dragon Naturally speaking, et al and while they're good, they're not perfect. Dragon in particular supposedly can be taught and learn the user's unique style of speech, so I'm also curious if Transcence will be going the route of machine learning and NLP.
Initially I was impressed, but now I worry about the logistic issues in regards to the fine details.
The main problem with general purpose real-time voice recognition is that current hardware is simply way too underpowered to accomplish the task. For instance, running the Dragon 11 SDK on an Intel Atom Z3770 has it about up to a minute behind transcribing the conversation! So I fear Transcene's approach is using the inferior Google Speech API which plainly put, sucks donkey balls, and is no way comparable to the latest Dragon engine. Apple uses the Dragon engine to implement Siri.
There's also the social burden of needing speakers to install an app on their smartphones and also actually have a smartphone in the first place. Will this be a free "remote mic" app such as Dragon 13 provides or does Transcene expect speakers to pony up the monthly cost as well?
I too think businesses and institutions will not allow this because they need overpriced ADA compliant solutions due to regulations. An example would be Interact-AS which is $800 and is essentially a fancy overlay for the Dragon engine (or Microsoft Speech in the low-end $150 version). Dragon itself only costs a one-time $99 to $199!
I'm also skeptical there's a viable business model in this. The vast majority of the deaf are on fixed incomes and not employed, so what is a relatively expensive $30 a month for app access buying them exactly? It better be a superior remote client to server transcriptioning experience! What's to stop Dragon from enabling multiple "remote mic" apps to work all at once with the mothership PC in their next version, etc.? And if not a client server model, what are the minimum hardware specifications to get "one second" transcriptions? A $599 smartphone is a ridiculous and overpriced luxury for the deaf.
As for Google Glass, it is a non-starter. No one wants to look like an idiot constantly staring off into their peripheral vision to read text instead of looking at whoever is speaking -- which is why Google Glass has been such a massive failure. What is truly needed is spatial aware, augmented reality where the transcriptions are placed over who is speaking via beaming text onto normal glasses or directly onto the retina. This technology already exists in various forms; it is just a matter of a real world implemention into a "killer app". Transcene, are you paying attention?
Nonetheless, this is a very important step forward that no one else is really doing, so I'm in for $250... and holding my breath.
Wow nice insight, I am going to try a protype, where a deaf look at the person on the retina or glass, he can see what he is talking...that's beautiful and amazing. Yes I love transcense , if they can do good and i believe a retina display is must for deaf and dump, suddenly imagine their world coming alive , where they can experience sound , on retina or glass, it's beautiful real work.
What does this have to do with the confidentiality issues that come with sending your full meeting audio and transcripts into the hands of a third party?
I thought the same thing - until I read this:
"It works by catching conversations from the voices of different individuals and assigning them a color bubble so the deaf person knows who said what. It works with a distributed microphone system on all the devices using the app so that it can distinguish each person from another."
What makes this better then use assign every microphone to a individual.
Is it possible to be 4 individuals and 2 microphones? Then I would be impressed but if it have to be as many microphones as individuals then it just seems over engineered.
Still, snarkiness aside, it's worth asking every once in a while if the problem you're solving is something that actually merits an app at all (or, for that matter, a computer).
That said, there does seem to be promise here, as there's a reason we video conference (or talk in person) rather than text-message for everything.