I am a part of the Talon community mentioned here, use Orca, have contributed to the Rust atspi bindings and feel like I know Linux accessibility quite well.
It is true to in Wayland you can write protocol extensions or custom compositors to get around these limitations. However, what many fail to realize is that the primary challenge in Linux accessibility is not just a technical problem as it is getting people to actually implement specs and care about it. Even with atspi itself, a standard that has existed for over a decade, major apps like Firefox often do not implement the atspi Collection interface. This is not a criticism but rather a practical statement that accessibility needs to be standardized and easy to implement for it to actually have any use. Orca works on Wayland but only in certain compositors. For assistive technology software developers, this pattern of supporting specific compositors is not feasible. It is important to understand that we need to support assistive tech generally. Not just ad hoc extensions for certain types of disabilities.
Wayland has no concept of global coordinates or global key bindings. The protocol itself is designed around atomicity which is a nice concept, but is fundamentally in conflict to the need of assistive technologies to control the entire state of the desktop globally. As such, atspi methods like get_accessible_at_point are impossible in Wayland.
I agree that X11 cannot be carried on forever, but with the current state of Wayland, the phasing out of X11 will have the effect of drastically harming the accessibility ecosystem. Accessibility is not a "nice to have", it is essential to the mission of community inclusion and wider goals of adopting desktop Linux in education and government.
Accessibility requirement is also something that EU is taking quite seriously lately, and if FOSS wants to be a part of something more than home enthusiasts, then it has to step up.
Awesome work! Often times in the TTS space, human-similarity is given way too much emphasis at the expense of hurting user access. Frankly as long as a voice is clear and you listen to it for a while, the brain filters out most quirks you would perceive on the first pass. Hence why many blind folks still are perfectly fine using espeak-ng. The other properties like speed of generation and size make it worth it.
I've been using a custom AI audiobook generation program [0] with piper for quite a while now and am very excited to look at integrating kitten. Historically piper has been the only good option for a free CPU-only local model so I am super happy to see more competition in the space. Easy installation is a big deal, since piper historically has had issues with that. (Hence why I had to add auto installation support in [0])
WebGPU actually generates the speech entirely in the browser. Web Speech is great too, but less practical if the model is complicated to set up and integrate with the speech API on the host.
The implementation of the Web Speech API usually involves the specific browser vendor calling out to their own, proprietary, cloud-based TTS APIs. I say "usually" because, for a time, Microsoft used their local Windows Speech API in Edge, but I believe they've stopped that and have largely deprecated Windows Speech for Azure Speech even at the OS level.
Just to be clear, are you really saying that speech with text to speech is server hosted and not on device for Windows?
You could do text to speech on a 1Mhz Apple //e using the 1 bit speaker back in the 80s (software automated mouth) and MacinTalk was built into the Mac in 1984. I know it’s built into both the Mac and iOS devices and run off line.
But I do see how cross platform browsers like Firefox would want a built in solution that doesn’t depend on the vendor.
If the application is still using the deprecated Microsoft Speech API (SAPI), it's being done locally, but that API hasn't received updates in like a decade and the output is considerably lower quality than what people expect to hear today.
Firefox on Windows is one such application that still uses SAPI. I don't know what uses does on other operating systems. Like, on Android, I imagine it uses whatever is the built-in OS TTS API, which likely goes through Google Cloud.
But anything that sounds at all natural, from any of the OS or browser vendors, is going through some cloud TTS API now.
Fantastic work. My dream would be to use this for a browser audiobook generator for epubs. I made a cli audiobook generator with Piper [0] that got some traction and I wanted to port it to the browser, but there were too many issues. [1]
Is there source anywhere? Seems the assets/ folder is bundled js.
In my opinion, there's a ton of opportunity for private, progressive web apps with this while WebGPU is still relatively newly implemented.
Would love to collaborate in some way if others are also interested in this
Are there any communities of others online dealing with general eye strain? Or other blogs / videos that have helped others? I have had chronic eye pain for a while now and could really benefit from hearing what has helped others. I have not found doctors to be helpful
I have a pretty normal Dell office monitor but not sure if I would benefit from an upgrade. I have relatively normal overhead lighting and try to take breaks or use a screen reader as much as I can, but haven't had much luck reducing pain.
While it's informative, I would proceed with caution. Many users there indulge in a level of obsession that is not helpful. The basics of reducing eye strain are actually simple:
1. Don't use a display at unnecessarily high brightness.
2. Make sure there's plenty of natural light around you (avoid LEDs if possible).
3. Take frequent breaks and look off in the distance. (If you're in a social setting, assume an air of mystery with your ponderous gaze.)
4. Reduce your level of stress. Stress makes nothing better and everything worse. Enjoy life! Stretch regularly to reduce muscle tension in the body.
5. Probably diet helps, but that's a whole can of worms. Don't obsess over it, but try to reduce inflammatory foods.
There's a good chance its due to dry eye. If so, you need to blink more. Get an eye compress (heat it up in microwave, toss on eyes for 10 minutes). That can help release oils from glands onto your eyes. Artificial tears can help with comfort but wont solve the underlying problem--we don't blink (enough) when we focus on screens that are close to our face.
I don't think that is the case but I could be wrong. My eyes do not feel dry at all and drops or hot washclothes haven't made much difference. Maybe that compress you speak of is better though
Hot wash clothes don't maintain the heat long enough to release the oils. Decent eye compresses are $20. Here's a decent one. Certainly others work too.
I got a fancier one from Tear Restore that has little cut outs, so I can see while using it (instead of keeping eyes closed). It may not work quite as good as the bruder, but it lets me get things done while using it.
The standard 60Hz refresh rate of monitors is unlikely to produce any eyestrain. The refresh rate of the backlight could produce eyestrain and headaches.
Unfortunately, the exact frequency of the PWM used for backlight isn't often mentioned in the specs.
In general, anything above 500 Hz is better as some people get headaches even for 250 Hz.
Many monitors allow adjustment of the individual R G and B components. This has been the single biggest help for me. I typically use R 45 G 35 B 15 or at night R 25 G 15 B 0 and that has helped me stay productive for longer without eye strain.
Perhaps off topic, not sure, but does anyone have opinions on the easiest way to add bilingual TTS? I am trying to make Mandarin language learning decks and having free automated TTS would be very useful.
I've tried out AwesomeTTS but found it a bit too complicated. Just want to automatically add TTS with one click ideally.
Does this program allow linking the same file to different locations on MacOS vs Linux? I have a few config files like the vscode settings.json that end up in different paths but have the same content. (I see it allows for files that are OS specific but not clear if there is a way to keep them in sync if they are the same, but just need different paths)
I use stow at the moment and it is almost perfect but I don't believe it can do that without multiple symlinks or something messy. Didn't like home manager and other dotfiles solutions seemed too bloated for my case
The workflow that you show in your repository is really not that different from Chezmoi. If you configure a post-add hook in Chezmoi (https://www.chezmoi.io/reference/configuration-file/hooks/), you can do `chezmoi add ~/.config/whatever/whatever.conf` and have the file auto-added to the Chezomi git repo + push it to some remote if you'd like.
I was also not thrilled about the idea of shipping an encrypted blob of important secrets around. I want my dotfiles to be public, so it's much nicer when the tool I use for managing my dotfiles natively integrates with 1password. Much of the templating functionality that I use from chezmoi is specifically for pulling stuff out of 1password.
Finally, the yadm "alternate files" functionality is nice, but I didn't really care about alternates for different OSes or hostnames or whatever. I wanted some configuration for my work machine(s) and some configuration for my personal machine(s) - that's it. That's the only distinction I care about. Chezmoi made it easy to prompt for the type of machine + change the things that get configured accordingly when bootstrapping a new machine (https://github.com/cweagans/dotfiles/blob/main/.chezmoi.toml...).
Did chezmoi make it easy to edit and then add fileswith changes?i hated always having to chezmoi this and that rather than doing what i want then runn*ng yadm -u to pick up all the changes in my tracked files
Pretty sure there are a couple of aliases that you could create to get you the workflow you want. I just edit my files on disk, chezmoi add file, then periodically fit commit and push them. You could create a post add hook to automatically git add/commit/push + I’m reasonably sure there’s a chezmoi command to list all tracked files, so you could iterate through all of them and do whatever you want with them.
Because most people who read ebooks use a Kindle, and an e-reader having Android is by far the simplest way to read your collection of Kindle books on a non-Amazon e-reader. This is because you can just download the Kindle app to access your collection.
Apparently the reason that all the Android based ereaders use Android 11 is due to the eink driver not supporting anything newer. Though it looks like they have moved to Android 12 on the latest devices so they must have figured something out.
Is anyone using graph databases or sparql for these sorts of use cases? sparql is so powerful but not sure if there is enough metadata with ethereum for it to be useful.
We see them for smaller transaction extracts, but most vendors here are too expensive for most teams at the scale you are likely thinking. Instead, we see log, kv, dwh, etc systems -- think elastic, Clickhouse, databricks, and the like. Some teams will stage daily batches into a graph DB, but having an expensive second system of records masquerading as a compute engine gets frustrating.
Separating scalable transaction storage from graph compute opens a lot. The new Google Spanner Graph launch is interesting here, and for OSS, we have been working on GFQL so you can bounce between vectorized Python dataframe mode on small graphs to 1B row GPU batches for bigger graphs.
I initially tried using Memgraph but faced stability issues and a lack of flow control tools during traversal, making it impossible to load one month’s worth of Ethereum blockchain data. The current solution, however, handles 10 times more data hosted on the same machine.
ClickHouse is an excellent database, provided you don't need to traverse graphs. Graph traversal requires many queries, and frequent disk interactions can significantly degrade performance.
For what it is worth, I still listen to and love human-read audiobooks! However, it is particularly useful to have an AI option for books that are too niche for there to be an incentive for an individual to narrate them. Lots of academic and personal texts fall into those categories.
It is true to in Wayland you can write protocol extensions or custom compositors to get around these limitations. However, what many fail to realize is that the primary challenge in Linux accessibility is not just a technical problem as it is getting people to actually implement specs and care about it. Even with atspi itself, a standard that has existed for over a decade, major apps like Firefox often do not implement the atspi Collection interface. This is not a criticism but rather a practical statement that accessibility needs to be standardized and easy to implement for it to actually have any use. Orca works on Wayland but only in certain compositors. For assistive technology software developers, this pattern of supporting specific compositors is not feasible. It is important to understand that we need to support assistive tech generally. Not just ad hoc extensions for certain types of disabilities.
Wayland has no concept of global coordinates or global key bindings. The protocol itself is designed around atomicity which is a nice concept, but is fundamentally in conflict to the need of assistive technologies to control the entire state of the desktop globally. As such, atspi methods like get_accessible_at_point are impossible in Wayland.
I agree that X11 cannot be carried on forever, but with the current state of Wayland, the phasing out of X11 will have the effect of drastically harming the accessibility ecosystem. Accessibility is not a "nice to have", it is essential to the mission of community inclusion and wider goals of adopting desktop Linux in education and government.