provides image extraction from PDF, OCR as well as a basic but nice proofreading web-ui.
Qwen 3/3.5 is good enough for OCR on books in Indic scripts. So that is what I am using. But you can configure the model that you want to use.
I may add a tesseract back end as well if necessary.
= Language Learning =
I have tried a few parallel text readers and was not satisfied by any of them. My website (https://www.adhyeta.org.in/) had a simple baked-in interface that I deleted soon after I developed it. However, this weekend, I sat down with Claude and designed one to my liking. I also ported the theming and other goodies from the website to this local reader. This will serve as a test bed for the Reader on the website itself.
LLMs now produce wonderful translations for most works. You can take an old Bengali book, have Claude/Gemini OCR a few pages and then also have it translate the content to English/Sanskrit. Then load it into the Reader and you are good to go!
The Reader, I will release this month. Claude is nice, but I do not like the way it writes code. It often misses edge cases and even some basic things and I have to remind it to do that. So I want to refactor/rearrange some stuff and test the functionality end-to end before I put it online.
India has a lot of languages and people need access to something than allows them to do basic stuff with it. I don't think relying on the US is a long term solution.
An example. I am into proofreading and language learning and am forced to rely on Claude/Gemini to extract text from old books because of the lack of good Indian models. I started with regular Tesseract, but its accuracy outside of the Latin alphabet is not that great. Qwen 3/3.5 is good with the Bombay style of Devanagari but craps the bed with the Calcutta style. And neither are great with languages like Bengali. In contrast, Claude can extract Bengali text from terrible scans and old printing with something like 99+ percent accuracy.
Models specifically targeted at Indian languages and content will perform better within that context, I feel.
This is a typical technical solution to a sociopolitical problem. The powers-that-be are not comfortable with the free-for-all that exists on the internet. All these laws are meant to fix that squeaky wheel, one ball-bearing at a time.
"Children" gets the Right to march behind you unquestioningly. "Misinformation/Nazis" does the same for the Left. This is now a perfect recipe for a shit sandwich.
I agree. But if you find a different way to protect the children, that normal people can understand and relate to ("It's like buying beer"), and still maintain privacy, you take away at least one leg of support for what a lot of states really want to do (remove anonymity).
It's better than the fatalism in your comment IMO.
What is the useful life of something like this compared to an RCC structure? Do you have to keep painting them to protect it from rust?
You do see steel used in mobile towers etc because you may not be able to place an RCC structure of that height on top of a building not designed for those loads. And in single story workshops/sheds.
This is the current building being built that I talk about. Notice (1) the two layers of paint, and (2) bolts being used instead of welds compared to the steel structure photos in the essay
The issues with subscriptions to streaming services are manifold (if you ignore the gargantuan waste of time that mindless TV-watching is):
- the UI is deliberately crap
- the library is deliberately incomplete
- accessing content is deliberately complicated
I had an experience recently where my phone provider bundles 20+ OTT services in a single plan within a single app that runs on your TV/phone/browser. The kicker: you can add stuff to a watch list, but the watch list is never exposed anywhere. While they want you to pay for stuff, they do not want you to be choosy about it.
YT has, to my mind, the best user interface of all the services I have tried.
Nice! His Shakespeare generator was one of the first projects I tried after ollama. The goal was to understand what LLMs were about.
I have been on an LLM binge this last week or so trying to build a from-scratch training and inference system with two back ends:
- CPU (backed by JAX)
- GPU (backed by wgpu-py). This is critical for me as I am unwilling to deal with the nonsense that is rocm/pytorch. Vulkan works for me. That is what I use with llama-cpp.
I got both back ends working last week, but the GPU back end was buggy. So the week has been about fixing bugs, refactoring the WGSL code, making things more efficient.
I am using LLMs extensively in this process and they have been a revelation. Use a nice refactoring prompt and they are able to fix things one by one resulting in something fully functional and type-checked by astral ty.
What is there to misunderstand? It doesn't even install properly most of the time on my machine. You have to use a specific python version.
I gave up on all tools that depend on it for inference. llama-cpp compiles cleanly on my system for Vulkan. I want the same simplicity to test model training.
pytorch is as easy as you are going to find for your exact use case. If you can't handle the requirement of a specific version of python, you are going to struggle in software land. ChatGPT can show you the way.
I have been doing this for 25 years and no longer have the patience to deal with stuff like this. I am never going to install Arch from scratch by building the configuration by hand ever again. The same with pytorch and rocm.
Getting them to work and recognize my GPU without passing arcane flags was a problem. I could at least avoid the pain with llama-cpp because of its vulkan support. pytorch apparently doesn't have a vulkan backend. So I decided to roll out my own wgpu-py one.
FWIW, I've been experimenting with LLMs for the last couple of years, and have exclusively built everything I do around llama.cpp exactly because of the issues you highlight. "gem install hairball" has gone way too far, and I appreciate shallow dependency stacks.
If you’re not writing/modifying the model itself but only training, fine tuning, and inferencing, ONNX now supports these with basically any backend execution provider without needing to get into dependency version hell.
What are your thoughts on using JAX? I've used TensorFlow and Pytorch and I feel like I'm missing out by not having experience with JAX. But at the same time, I'm not sure what the advantages are.
I only used it to build the CPU back end. It was a fair bit faster than the previous numpy back end. One good thing about JAX (unlike numpy) is that it also gives you access to a GPU back end if you have the appropriate stuff installed.
> The CEO is also more puritan than the pope himself considering the amount of censorship it has.
In that case, you should try OpenAI's gpt-oss!
Both models are pretty fast for their size and I wanted to use them to summarize stories and try out translation. But it keeps checking everything against "policy" all the time! I created a jailbreak that works around this, but it still wastes a few hundred tokens talking about policy before it produces useful output.
I started using this a couple of days ago. It is a fully functional replacement for what I have been doing with WhatsApp. About 30-40% of my n/w is on it now and I have also created our Sanskrit channel on it.
What it is missing:
- E2E encryption for text messages
- Communities as a container for groups
- Chat exports
- UPI payment integration
Also, the servers are under pressure so messages can get delayed sometimes.
But Vembu has promised continuous development. So let's see.
I am using it regularly and do hundreds of messages every day across groups and contacts.
I have been planning to put out a quarterly Sanskrit newsletter for some time now, and was dreading having to deal with LaTeX. For basic stuff, LibreOffice PDF export works. But that is not a plain text workflow.
I then discovered typst and it is a breath of fresh air. Unicode/Dēvanāgarī support out-of-the-box, no installing gigabytes of packages, near-instant compilation.
I will post it on our website as well as reddit when it is ready. I am taking my time to ensure that it does not become a one-off thing and can continue for many quarters.
= Proofreading =
https://github.com/adhyeta-org-in/adhyeta-tools
provides image extraction from PDF, OCR as well as a basic but nice proofreading web-ui.
Qwen 3/3.5 is good enough for OCR on books in Indic scripts. So that is what I am using. But you can configure the model that you want to use.
I may add a tesseract back end as well if necessary.
= Language Learning =
I have tried a few parallel text readers and was not satisfied by any of them. My website (https://www.adhyeta.org.in/) had a simple baked-in interface that I deleted soon after I developed it. However, this weekend, I sat down with Claude and designed one to my liking. I also ported the theming and other goodies from the website to this local reader. This will serve as a test bed for the Reader on the website itself.
LLMs now produce wonderful translations for most works. You can take an old Bengali book, have Claude/Gemini OCR a few pages and then also have it translate the content to English/Sanskrit. Then load it into the Reader and you are good to go!
The Reader, I will release this month. Claude is nice, but I do not like the way it writes code. It often misses edge cases and even some basic things and I have to remind it to do that. So I want to refactor/rearrange some stuff and test the functionality end-to end before I put it online.