> If you blur enough, every pixel is the same color, which obviously has destroyed all information.
Not true. The ASCII art people have sorted characters by their “shade”; if you know the foreground and background colour of a letter, and you know the single colour of pixel it blurs to, you can still work out what the letter was.
Blurring discards some information and obfuscates other information, but really, given how easy it is to reverse-engineer such things, we should measure the number of bits of information remaining.
If information in image + information about image > bits of information in sensitive data, then in theory you can recover the data.
So simply deleting the data from the image (e.g. blacking it out, removing reflections) is preferable. Saves you a lot of effort!
> if you know the foreground and background colour of a letter, and you know the single colour of pixel it blurs to, you can still work out what the letter was.
Totally agreed, and I like what you said about information remaining - you start to get into fun information theory stuff.
For example, let's say you're given one of the 'shades' you mentioned. I give you #a9a9a9. There's 32 bits of information there. However, if it's a blurred letter of black text on white background, it's always going to be grayscale. There's only 256 possible grayscale values - only 8 bits. Luckily, since there's only 26 lowercase english characters, we can easily fit that into our 255 values. Information is not destroyed! This is the shade->char map you were talking about.
But now what if we blur enough to turn two letters into one shade. That one shade will still be between 0 and 255. But now there's 26*26=676 possibilities that could've created that shade. More inputs than we have outputs for. There's no way to fit more than 256 possible inputs into 8 bits of information. Shade '98' could be 3 or 4 different inputs. However, we can be very clever...
We know that the letter 'u' comes after 'q' almost always. We know that 'x' never appears next to 'j'. We know lots of things, and can supplement the destroyed information, with outside information. We can actually change the whole context for this conversation now. We're not talking about blurring. We're talking about fitting english text into less bits than ought to be possible. We're talking about compression. This is exactly what compression algorithms for english text will do. There's a lot of redundant information in plaintext, just like there's a lot of redundant information in our images of text. An 8x8 pixel character glyph can easily reduce the information to a single pixel. However, there are limits. You can compress english text by 10x, if it's simple enough. You can't compress War and Peace into 5 bytes. You can blur text a few pixels and not destroy information. But you can't blur a paragraph of 12px font by 500px and get your original information.
Not true. The ASCII art people have sorted characters by their “shade”; if you know the foreground and background colour of a letter, and you know the single colour of pixel it blurs to, you can still work out what the letter was.
Blurring discards some information and obfuscates other information, but really, given how easy it is to reverse-engineer such things, we should measure the number of bits of information remaining.
If information in image + information about image > bits of information in sensitive data, then in theory you can recover the data.
So simply deleting the data from the image (e.g. blacking it out, removing reflections) is preferable. Saves you a lot of effort!