Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Brain-to-Speech Tech Good Enough for Everyday Use Debuts in a Man with ALS (scientificamerican.com)
39 points by nilv on Aug 16, 2024 | hide | past | favorite | 18 comments


Co-senior author here. I'm a long-time hackernews reader. Very excited to see that this work made the front page. I'm happy to take questions.

The Scientific American article was written about our lab's first major paper, "An Accurate and Rapidly Calibrating Speech Neuroprosthesis" published 14 August 2024 in the New England Journal of Medicine [0].

This video nicely summarizes the project [1].

Some additional coverage can be found in the New York Times[2] and Bloomberg[3].

Our lab's website can be found here (written in elm) [4]. The software platform that this BCI uses can be found here [5].

[0] https://www.nejm.org/doi/full/10.1056/NEJMoa2314132

[1] https://www.youtube.com/watch?v=thPhBDVSxz0

[2] https://www.nytimes.com/2024/08/14/health/als-ai-brain-impla...

[3] https://www.bloomberg.com/news/articles/2024-08-14/brain-tec...

[4] https://neuroprosthetics.science/

[5] https://iopscience.iop.org/article/10.1088/1741-2552/ad3b3a


Congratulations on achieving this, I teared up watching that man’s face in your linked summary video when he was able to communicate with your device!

It looks like there are three components: 1.) the device itself 2.) a microphone 3.) eye tracking (??)

Is that right? If so, to what degree does a person’s ability to make at least utterances into the microphone matter to the overall system’s performance? Are you heading in the direction of purely using the device itself to determine what patients are intending to say?


Thank you for the kind words!

A brain computer interface is a device that records brain signals, and allows people with paralysis to control objects in their environment using their thoughts. Here, we demonstrate how the BCI can help restore communication to a man with ALS using only neural signals recorded from the speech-related motor areas of the brain (ventral precentral gyrus). As he tries to speak, our decoding system translates the neural activity into words on a screen. Hence, to answer your question, we are only using brain signals to determine what he's trying to say.

The microphone is not involved in interpreting his speech at all (and in fact, speech-to-text algorithms applied to the audio signals are nonsensical). One can hear him speak in the supplementary videos in the NEJM paper, and in the video produced by UC Davis.

The eye-tracker is used to help control the GUI, much like an able-bodied person would use a mouse.


Fascinating, thank you for your response. Best of luck as your team’s work progresses!


This is so fascinating and amazing, congratulations on the achievements you and your team have made.

What are some current limitations of this technology?

Where do you think the trajectory will lead in 5 years in terms of capability?

Are there long term health and safety risks that have been discovered through the various implementations of this technology since 2004?


I'm optimistic for the future! Basic science laboratories have provided the foundational insights needed for clinical trials to explore BCI technology in people with paralysis. As academics have de-risked BCI development, there are now multiple companies that are developing fully-implanted devices, including Neuralink, Paradromics, Synchron, and Precision. I hope that we'll be at a point in the next 5 years where I will be able to prescribe one (or perhaps one of many!) of these devices to patients.

The progress described in the NEJM paper is happening as part of an ongoing clinical trial called BrainGate2. The clinical trial has been ongoing for over 20 years. The interim safety analysis was recently published about the first 17 years of the clinical trial [0].

https://pubmed.ncbi.nlm.nih.gov/36639237/


Does the device/algorithm have to be trained on a specific person's brain/physical parameters? Or could this be swapped to another arbitrary person and work essentially the same?

The thought behind the question: how similar is everyone's process to construct the word "happy"?


This really is a landmark success. Their patient, Harrell, who suffers from ALS, can use their Brain-to-Speech interface daily reliably (<3% error rate) to talk to family.

Big congrats to the team at UC Davis.


This is incredible to me, the kind of science fiction I actually want to see become fact in situations like this.

I'm wondering a bit about this claim, though, and hoping someone more familiar with the field can shed some light on it:

> The device predicts the wrong word less than 3 percent of the time, an error rate on par with nondisabled speakers reading a paragraph aloud.

Does this mean that in a paragraph of 100 words, nondisabled readers are expected to get 3 words wrong? That would be a lot higher than I would expect, but is what it sounds like the quoted sentence means.

Or is it more like if a hundred nondisabled people read a standard paragraph out loud, three of them would trip over (at least) one word in it? This seems closer, but still like a fairly high error rate for something they're calling "perfect."


3% in unprepared reading aloud seems reasonable if we're talking minor hiccups. In normal speech or anything prepared it's way too high


Which "3%" stat are we measuring, though? That's the part I don't understand. Does it mean we would expect 100% of people to miss an average of 3% of the total words, or does it mean we would expect 3% of the people to miss one (or more) words? Those are very different stats, and I'm not sure which is meant here.


Seems very reasonable to me. Note that this statistic applies to an average person, not to an average genius.


If I think something like "OMG, please STFU" will that system vocalize it? This is a real question.


Thanks for the question. If our participant were to try and "think" something, it won't work. This is not a mind-reading device. Instead, he has to try and it say it for it to work.

The system maps a sequence of phonemes decoded from neural signals into words on a screen. The words are selected from a 125,000 word dictionary, and he can add custom words to the list.

So, to answer your question, if he were to add "OMG" and "STFU" to the dictionary, then he could say them (if he wanted to).


Thank you for taking the time to answer my question.


Not as I understand it. The device is apparently hooked to the neurons that are responsible for activating muscle tissue that would ordinarily articulate speech.

It is not connected to the "part" where ones inner voice lives, because nobody has a clue where that is situated.


If you want also to see and hear him (and the team) I found the video in this other article: https://www.brown.edu/news/2024-08-14/bci-speak-again


Impressive. I didn't find any mention of the speed. How many words per minute does this allow for?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: