Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm slightly shocked that (all?) modern OCR systems can't handle a perfectly clean image of Courier text with 100% reliability.

I wonder if reducing the font size for faster transmission made it worse? A larger font might have been easier to read. Probably save time in the long run.

EDIT: Actually, looking at the output of the Fax to Binary Converter program, I think that's very likely. Even I'm not 100% sure whether that 8x6 glob of pixels is a 0 or a D.

Hmmm. If nothing else, what about search-and-replacing the Word doc to replace some of the most difficult characters with clearer ones, and then reversing the process on the other end? I mean, that's ridiculously complex, but not as complex as writing a custom Fax to Binary Converter app.



Author here. I was also surprised by this.

I simplified the story a bit for brevity. I actually tried a bunch of different font styles, including a 47 page fax using a pretty large size Courier (with only 72 characters per line). The screenshots from the blog post were after the point I decided OCR wasn't working, so I was using a heavily reduced font size to optimize the transfer time. Hence the characters looked like barely-legible blobs.

The Fax-to-Binary converter isn't doing anything particularly complicated with the image, just breaking it up into an accurately-aligned grid and hashing the pixel data of each tile.

Replacing the characters in the document hadn't occurred to me at the time! It's a good idea, but for my programmer brain, writing this software was the easier (and more fun) solution :)


Just out of curiosity, would that old Mac OS have had OCR A or OCR B fonts available? (pretty sure Windows would have). These were essentially designed for OCR (duh...)


Nope, not that I found. The set of fonts installed was _very_ limited. I can't remember the exact count, but I think it was less than 20. Nothing specialized like an OCR font.


> I simplified the story a bit for brevity. I actually tried a bunch of different font styles, including a 47 page fax using a pretty large size Courier (with only 72 characters per line).

Wow. That is very surprising. I'm baffled.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: