Congratulations, I guess? I can't read your content. But ... The machines can't ...

lumirth · 2025-12-14T18:24:07 1765736647

That’s the correct text of the article, as far as I can tell. Though not the entirety of it. The author goes on to say that ChatGPT wasn’t able to parse out the underlying text.

Part of the reason it might be useful is not because “no AI can ever read it” (because I’m sure a pentesting-focused Claude Code could get past almost any similar obfuscation), but rather that the completely automated and dumb scrapers stealing your content for the training of the AI models can’t read it. For many systems, that’s more than enough.

That said, I recently completely tore apart my website and rebuilt it from the ground up because I wasn’t happy with how inaccessible it was. For many like me, sacrificing accessibility is not just a bad look, but plainly unacceptable.

ctoth · 2025-12-14T18:28:19 1765736899

I didn't use Claude Code. I just pasted it directly into the web interface and said "I can't read this, can you help?" and then I excerpted the result so you sighted folks didn't have to reread, you could just verify the content matched.

So basically this person has put up a big "fuck you" sign to people like me... while at the same time not protecting their content from actual AI (if this technique actually caught on it is trivial to reverse it in your data ingestion pipeline)

flir · 2025-12-14T18:52:44 1765738364

But it's "made with ♥" (the footer says so).

(He's broken mainstream browsers, too - ctrl+f doesn't work in the page.)

GPT 5.2 extracted the correct text, but it definitely struggled - 3m36s, and it had to write a script to do it, and it messed up some of the formatting. It actually found this thread, but rejected that as a solution in the CoT: "The search result gives a decoded excerpt, which seems correct, but I’d rather decode it myself using a font mapping."

I doubt it would be economic to decode unless significant numbers of people were doing this, but it is possible.

NewsaHackO · 2025-12-14T19:38:44 1765741124

This is the point I was making downthread: no scraper will use 3m36s of frontier LLM time to get <100 KB of data. This is why his method would technically achieve what he asked for. Someone alluded to this further down the thread, but I wonder if one-to-one letter substitution specifically would still expose some extractable information to the LLM, even without decoding.

tilschuenemann · 2025-12-14T18:46:24 1765737984

Yes, it's worse for screenreaders, I listed that next to other drawbacks which I acknowledged. I don't intend to apply this method anywhere else due to these drawbacks, because accessibility matters.

It's a proof of concept, and maybe a starting point for somebody else who wants to tackle this problem.

Can LLMs detect and decode the text? Yes, but I'd wager for the case that data cleaning doesn't happen to the extent that it decodes the text after scraping.

lumirth · 2025-12-14T18:40:23 1765737623

I didn’t think you did use Claude Code! I was just saying that with AI agents these days, even more thoroughly obfuscated text can probably be de-obfuscated without much effort.

I suppose I don’t know data ingestion that well. Is de-obfuscating really something they do? If I was maintaining such a pipeline and found the associated garbage data, I doubt I’d bother adding a step for the edge case of getting the right caesar cipher to make text coherent. Unless I was fine-tuning a model for a particular topic and a critical resource/expert obfuscated their content, I’d probably just drop it and move on.

That said, after watching my father struggle deeply with the complex computer usage his job requires when he developed cataracts, I don’t see any such method as tenable. The proverbial “fuck you” to the disabled folks who interact with one’s content is deeply unacceptable. Accessible web content should be mandatory in the same way ramps and handicap parking are—if not more-so. For that matter, it shouldn’t take seeing a loved one slowly and painfully lose their able body to give a shit about accessibility. Point being, you’re right to be pissed and I’m glad this post had a direct response from somebody with direct personal experience needing accessible content so quickly after it went up.

NewsaHackO · 2025-12-14T18:53:04 1765738384

You are missing his point. He is not saying that the Caesar cipher is unbreakable by LLMs. These web scrapers are gathering a very large amount of data to train new LLMs. It is not feasible to use hundreds of thousands (millions?) of dollars to run petabytes of random, raw data into a frontier LLM model before using the data, just to catch one person possibly using a cipher to obfuscate their data. That is the value proposition: make your data slightly harder to scrape so that web scrapers for LLM training would rather let your data be unusable than make an investment to attempt to extract it.

fiddlerwoaroof · 2025-12-14T18:41:38 1765737698

Gemini (3.0 Thinking) solves it too.

esquivalience · 2025-12-14T18:26:24 1765736784

This is fairly highly accurate (from a skim read, close to but not quite 100%). The article describes fooling ChatGPT with a caeser cipher, but not a full test of the obfuscation in-practice.

mft_ · 2025-12-14T19:02:19 1765738939

Does (I’m assuming) your screen reader cope with text that’s displayed in (for example) a raster or a vector image?

Buttons840 · 2025-12-14T19:14:16 1765739656

Android's built-in OCR worked for me. I was able to copy the text.