Purely subjectively, as someone with a hearing loss who has learned a couple other languages (plus teaching myself how to speak my native English understandably) I do think that different languages may be easier or harder for people to lipread, at least in my situation.
Mandarin encodes a great deal of the information into the vowel including the tone. This is only partly visible, at best. (But I can hear this part just fine.) The various initial consonants are all visible on the lips. With the exception of the the aspiration contrast (Pinyin t/d, g/k etc.) which is almost invisible, and probably poses the biggest challenge. Voicing contrast like in English or French is also very hard to see. (But in my case easier to hear.) On the other hand, I find both for lipreading and hearing in English or French, it is the final consonants in syllables and the contrast between various consonant clusters I tend to lose.
I'm not sure which is innately harder, they definitely pose different challenges.
As an aside, it's an amazingly complex process cognitively, correlating often extremely subtle facial movements with particular sounds. And most of us do it! We're watching people's eyes and lips and chin and even their voice box and nostrils as they speak. And not just the hard of hearing. It's a classic and reproducible effect, where adding video of the speaker's face can make quiet or noisy borderline unintelligible speech suddenly quite intelligible. And when researchers play tricks, such as with visual information that conflicts with the auditory, the visual channel will often influence or even take priority over what is actually heard: https://en.wikipedia.org/wiki/McGurk_effect
The article is interesting, but the title is click bait:
“So where does that leave us in our central question of which language is the hardest to understand from visual cues? “The short answer is, we don’t know,” says Masapollo.”
Especially given that even the paragraphs following that sentence are essentially "well maybe it's X Y or Z because people there have mustaches".
I'm sure answering this question scientifically is hard but I'd love to know something like what happens if you get N native speakers from K languages and take a sentence and a paragraph from each language and ask the N native speakers what they see (and compare the results).
> In the field of particle physics, the concept is known as Hinchliffe's Rule, after physicist Ian Hinchliffe, who stated that if a research paper's title is in the form of a yes–no question, the answer to that question will be "no".[39][40] The adage led into a humorous attempt at a liar's paradox by a 1988 paper, written by physicist Boris Kayser under the pseudonym "Boris Peon", which bore the title: "Is Hinchliffe's Rule True?"(Peon 1988)
worth reading the Wikipedia page just for gems like this.
A lot of sounds and nuances are not visible at the lip level. You still may be able to guess something based on the context, but an ML model based purely on lip visual will fail pretty hard.
I largely agree. But tone is, in practice, more than just the pitch. I know that in Mandarin it ends up affecting vowel length, volume and quality a little bit as well. I'm pretty sure you could do much better than chance just with lip position over time, for example. And practical lipreading looks at much more than just lip position.
I’ve never really had a need to lip read and never really looked into the techniques to learn how to do it, so this may be old hat. But one thing I found very interesting is that if you imagine (more active than that really, a projection that you ’hear’) sound coming from the person’s mouth, like a simple ‘aaaaaaaaaaaa’ then watch them speak you’ll start to hear utterances as they move their lips. In turn you can start to string those together contextually to kind of hear what they are saying.
This probably only works if you have typical hearing and works better if you know the persons voice, but it’s not absolutely necessary.
That's actually a really interesting question. For a human, I'd guess that it's almost impossible, at least with any precision. But with some kind of ML or AI solution, especially calibrated for a specific person, I think it could be achievable. At least when I whistle I don't keep my mouth rigidly in the same shape no matter what pitch I'm whistling. The higher I go, the more I purse my lips; the lower I whistle, the lower my jaw is. Add in neck movements to factor in tongue and diaphragm adjustments for pitch, and you have a lot of signals to read.
Hah, I like to joke with my Chilean friend that they drop syllables like they were going out of style. Chileñol almost sounds like Portugese to my untrained ear.
One time I was at some random food court at a bus depot in Talca and wanted to order a dish called Chorrillana. The guy behind the counter had no idea what I was asking for when I tried to pronounce it phonetically... but then I remembered... so I asked for "Chorana" and he was like "ooooooo no problemo!"
Closer to my home: my mum, Devonion, could make herself nearly unintelligible to me by speaking as she did as a young girl. That's my mum, not a random stranger from another country 8)
I'm no expert but I do know that the French spoken in Quebec is rather different than that spoken in France, which of course has many varieties of "French". There's also whatever is spoken in Brittany (Breton?) which I think is closer to Welsh/Cornish/Irish/Scottish ie Brythonic languages than English or French.
English is a right old mish mash of loosely coupled dialects, patois, pidgins and whatever linguists call the other varieties and vagaries. I gather that the easiest variety of English to understand for non native speakers, in general is the Dublin accent.
I know it isn't brummie because I remember seeing big Tex (he wore a stetson on his 10 gallon head) on a Norwegian cruise out of Miami heckling a comic on stage. I ended up translating. I kept it simple and told Tex that he was from Birmingham, England because trying to explain the "Black Country" was likely to be tricky. I can just about tell the difference between a brummie and a yam yam accent. The next act was a geordie and I gave up and suggested to Tex that we hit the bar.
Are the mouth shapes made by ASL (and other sign-languages) mandatory and/or standard? I sort of had no idea they were part of the language if so -- having never really thought about it.
Mandarin encodes a great deal of the information into the vowel including the tone. This is only partly visible, at best. (But I can hear this part just fine.) The various initial consonants are all visible on the lips. With the exception of the the aspiration contrast (Pinyin t/d, g/k etc.) which is almost invisible, and probably poses the biggest challenge. Voicing contrast like in English or French is also very hard to see. (But in my case easier to hear.) On the other hand, I find both for lipreading and hearing in English or French, it is the final consonants in syllables and the contrast between various consonant clusters I tend to lose.
I'm not sure which is innately harder, they definitely pose different challenges.
As an aside, it's an amazingly complex process cognitively, correlating often extremely subtle facial movements with particular sounds. And most of us do it! We're watching people's eyes and lips and chin and even their voice box and nostrils as they speak. And not just the hard of hearing. It's a classic and reproducible effect, where adding video of the speaker's face can make quiet or noisy borderline unintelligible speech suddenly quite intelligible. And when researchers play tricks, such as with visual information that conflicts with the auditory, the visual channel will often influence or even take priority over what is actually heard: https://en.wikipedia.org/wiki/McGurk_effect