Hacker Newsnew | past | comments | ask | show | jobs | submit | Neurrone's commentslogin

> Though you can also get trapped in a fractal of polish-chasing

I think this applies to anything :)

> You quickly realize the behavior you see is an idiosyncrasy in the screen reader itself.

Yeah this is definitely a pain point during development. There is standardization and efforts to reduce these differences though, so I hope this gets better over time.


Incredible article. I never knew there were so many JavaScript runtimes out there, and that it was possible to run JavaScript on microcontrollers.


Thanks for reading :)


Yes, exactly.


Yes, that is how I usually consume my content. Cognitive load is actually lower for unstructured prose compared to code, think about fiction for example. Code is much denser.

When I read to relax, it is for enjoyment, so I don't aim to read as fast as possible. This is why I still listen to human narrated audiobooks, since a good narrator adds to the experience.


> Casually dropping that everyone is speaking so slow. That you must use the time between sentences for something meaningful, pretty funny. :-)

Didn't mean it that way . But that is truly the only way that I can use the computer while in a meeting.


If I'm listening to a Podcast or a Video at 2X or 3X, coming back to normal feels glacial by comparisson

If you're used to 800 wpm bursts, I'd Assume normal speaking pace will feel slow any way you cut it

On a slightly related note, have you felt this affecting your speech patterns?


> If you're used to 800 wpm bursts, I'd Assume normal speaking pace will feel slow any way you cut it

Edit: if you're referring to videos or podcast then yes, assuming the objective is to get information as quickly as possible.

Actually that isn't really the case. That might happen if you're asking someone to read something to you for an extended period of time, but that's not how normal conversations happen.

> On a slightly related note, have you felt this affecting your speech patterns?

Nope.


I was referring to how synthetic speech always talk in exactly the same way down to inflection and pauses whenever it encounters the same phrase, which isn't how people talk. So this helps a lot with comprehension. Structure of the content does help as well.


Yup, it does. I was an early adopter of VS Code. It has been extremely satisfying seeing the progress they've made with accessibility. I provide feedback on a semi-frequent basis. Nowadays its only to flag regressions.


Author here, surprised this somehow got onto HN since I only posted on Mastodon.

Happy to answer any questions.


Great post. When you're reading a Mermaid diagram, do you just happen to have memorised that "dash dash greater than" means "arrow"? I assume the screen reader doesn't understand ASCII art.

And how painful is reading emails? HTML email is notoriously limited compared to HTML (and CSS) in the browser, but it's pretty hard to add structure to a plain text email too. How annoying is it when I do so using e.g. a "line" made out of repeated dashes?


Oh boy, don't get me started on emails. HTML emails are such a pain because of the hacks needed to get it to render properly across multiple devices. So I hear a lot of information about the tables being used for layout purposes, which is a pain because the tables are not semantically meaningful at all. And then there are emails that just have one or more images.

For a line of dashes like "-------", most screen readers can recognize repeating characters, so that string gets read for me as "7 dash". If using an <hr> element, then there is no ambiguity about what it means.


Users of the email client mutt has a similar problem, it doesn't render HTML and CSS and displays it as text, so instead they've developed a variety of workarounds like pushing the email body through a terminal web browser before showing it in mutt.

Might work for you too.

Edit: Also, do you MUD?


Oh yes. It was one of my formative childhood experiences. My first mud was Alter Aeon, but I haven't played in almost 10 years. I enjoyed myself during the 5 years or so that I played and got to know a lot of people. The first first thing I ever programmed was a bot to automatically heal group members.

Then Empire Mud, but I left due to disagreements with the admin. I loved the concept but it didn't really have the playerbase to support it.

More recently, I was on Procedural Realms. But I was affected by 3 separate instances of data corruption / loss, the last of which resulted in an unplanned pwipe since there were no offsite backups and the drive on the server failed. Years of progress gone due to lack of backups, so I'm never going back.

Ever since, I've been trying to find something else. Perhaps I'm just getting older but I don't have the patience to grind that I once had, which rules out most hack and slash muds. These days, I prefer something with interesting quests, places to explore and mechanics.

What muds do you play?


Neat. I mostly play Discworld MUD, which isn't very often due to small kids these days. It's a good all-rounder, has both fine grind and massive amounts of quests, exploration and crafting. Over the years I've become friends with many screen reader users there, and some of them were the fastest hunting group leaders I've seen.

http://discworld.starturtle.net/lpc/


I tried it briefly but bounced off after the tutorial finished, I couldn't figure out what to do.

Is reading the books required for enjoyment? I haven't read anything from the Discworld series.


After the tutorial, if you choose morporkian as language and Ankh-Morpork as starting location you'll be put in one of the busiest places in the world, outside a bar. Either outside or inside you'll find people who can help you get started. The 'say' command says something to the entire room, and 'tell username message' sends them a private message.

There's also a newbie group chat where you can ask for help, the syntax is 'newbie' followed by your message. It'll go away once you get too many levels in your skills.

A drawback with Ankh-Morpork is that it has cops, they might interfere if you decide to attack something that isn't a rat or cockroach or somesuch, but if you get caught and put in jail you'll eventually be released. Getting killed is a bit worse, you either waste your experience points by getting a raise from an NPC, or send a message to a particular type of priest that can resurrect you.


Thanks, I'll try this out again sometime.


Really liked the article.

The interesting part for me was that you can recognize synthetic voice much faster than human speech. Is there a specific voice you are using for 800wpm or it can be any TTS? Also, I think older voices sound more robotic that the newer ones (I mean pre AI, like the default on android is newer for me). Is there a difference for how fast you can listen to the newer more nicely sounding ones or the older more robotic ones?


> Is there a difference for how fast you can listen to the newer more nicely sounding ones or the older more robotic ones?

Yes. The main requirements for the TTS I use is it must be intelligible at very high rates of speed and it must have no perceivable latency (i.e, how long it takes to convert a string of text to audio). This rules out use of almost all voices, since a lot of them are focused on sounding as human as possible, which comes at the expense of being intelligible at high rates. The newer voices also usually don't have low latency.

> Is there a specific voice you are using for 800wpm or it can be any TTS?

I'm using ETI Eloquence. If I switched to another voice capable of being intelligible at ESpeak, I would have to slow down because I'm not used to it and have to train myself to get back to the speeds I'm used to.


Thank you for the answers. Even I'm not new to TTS usage, overall, this feels a bit like cyberpunk for me, like a neural interface that can provide you information as fast as you can consume it, not just how fast your "ears" can recognize it. Like a human modem.


I've added a section about TTS voices to the post, see https://neurrone.com/posts/software-development-at-800-wpm/#...


Great article. I was of course surprised to learn that it's possible to learn to understand the super-fast TTS, since videos and podcasts start to get very tough to follow around 2.5x and higher. I've been wondering: surely better algorithms for generating high-speed speech are possible, especially as we have more and more compute around to throw at it. It's not easy to search for, since "speed" for most tools is about speed of generation rather than wpm. As normal-speed neural net TTS models get incredibly good, I am hoping to see more attention paid to the high-speed use case.


I've added a section about TTS voices to the post, see https://neurrone.com/posts/software-development-at-800-wpm/#...


Yeah, the options for this are quite limited. The only ones I know of are Espeak (open source) but doesn't sound as good, and Eloquence, which is an abandoned product.

The use case for super high speed TTS are pretty niche though.


Thanks for the blog post!

I was wondering what TTS voices you use? I've heard from other blind people that they tend to prefer the classic, robotic voices rather than modern ML-enhanced voices. Is that true in your experience, too?


That was my initial thought, too - "I bet they can use a nicer voice now!"

Sounds like the robotic voice is more important than we give it credit for, though - from the article's "Do You Really Understand What It’s Saying?" section:

> Unlike human speech, a screen reader’s synthetic voice reads a word in the same way every time. This makes it possible to get used to how it speaks. With years of practice, comprehension becomes automatic. This is just like learning a new language.

When I listened to the voice sample in that section of the article, it sounds very choppy and almost like every phoneme isn't captured. Now, maybe they (the phonemes) are all captured, or maybe they actually aren't - but the fact that the sound per word is _exactly_ the same, every time, possibly means that each sound is a precise substitute for the 'full' or 'slow' word, meaning that any introduced variation from a "natural" voice could actually make the 8x speech unintelligible.

Hope the author can shed a bit of light, it's so neat! I remember ~20 years ago the Sidekick (or a similar phone) seemed to be popular in blind communities because it also had settings to significantly speed up TTS, which someone let me listen to once, and it sounded just as foreign as the recording in TFA.


Yeah, that bit about each phoneme sounding exactly the same everytime really made a lot of sense. Even if the TTS phoneme sounds nothing like a human would say it, once you've heard it enough times, you just memorize it.

I guess sounding "natural" really just amounts to adding variation across the sentence, which destroys phoneme-level accuracy.


> When I listened to the voice sample in that section of the article, it sounds very choppy and almost like every phoneme isn't captured.

Every syllable is being captured, just speed up so that the pauses between them are much smaller than usual.


I've added a section about TTS voices to the post, see https://neurrone.com/posts/software-development-at-800-wpm/#...



Edit: typo

I didn't post this onto HN, so I only just found out about this thread.

Thanks for mentioning the margin issue, I've tried fixing it now. Let me know if its still an issue.

> I wonder how quickly ChatGPT replaces much of the customized tools here?

Probably not many. It is prone to Hallucinations, and the latency involved for getting a response means that I only use it when I have to.


Awesome, looks great! Thanks for jumping into the thread here.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: