I'm slightly shocked that (all?) modern OCR systems can't handle a perfectly clean image of Courier text with 100% reliability.
I wonder if reducing the font size for faster transmission made it worse? A larger font might have been easier to read. Probably save time in the long run.
EDIT: Actually, looking at the output of the Fax to Binary Converter program, I think that's very likely. Even I'm not 100% sure whether that 8x6 glob of pixels is a 0 or a D.
Hmmm. If nothing else, what about search-and-replacing the Word doc to replace some of the most difficult characters with clearer ones, and then reversing the process on the other end? I mean, that's ridiculously complex, but not as complex as writing a custom Fax to Binary Converter app.
I simplified the story a bit for brevity. I actually tried a bunch of different font styles, including a 47 page fax using a pretty large size Courier (with only 72 characters per line). The screenshots from the blog post were after the point I decided OCR wasn't working, so I was using a heavily reduced font size to optimize the transfer time. Hence the characters looked like barely-legible blobs.
The Fax-to-Binary converter isn't doing anything particularly complicated with the image, just breaking it up into an accurately-aligned grid and hashing the pixel data of each tile.
Replacing the characters in the document hadn't occurred to me at the time! It's a good idea, but for my programmer brain, writing this software was the easier (and more fun) solution :)
Just out of curiosity, would that old Mac OS have had OCR A or OCR B fonts available? (pretty sure Windows would have). These were essentially designed for OCR (duh...)
Nope, not that I found. The set of fonts installed was _very_ limited. I can't remember the exact count, but I think it was less than 20. Nothing specialized like an OCR font.
> I simplified the story a bit for brevity. I actually tried a bunch of different font styles, including a 47 page fax using a pretty large size Courier (with only 72 characters per line).
I wonder if reducing the font size for faster transmission made it worse? A larger font might have been easier to read. Probably save time in the long run.
EDIT: Actually, looking at the output of the Fax to Binary Converter program, I think that's very likely. Even I'm not 100% sure whether that 8x6 glob of pixels is a 0 or a D.
Hmmm. If nothing else, what about search-and-replacing the Word doc to replace some of the most difficult characters with clearer ones, and then reversing the process on the other end? I mean, that's ridiculously complex, but not as complex as writing a custom Fax to Binary Converter app.