The mentioned Xerox copier incident was not an OCR failure, but the copier actively changed the numbers in the original image due to its image compression algorithm.
Brief: Xerox machines used template matching to recycle the scanned images of individual digits that recur in the document. In 2013, Kriesel discovered this procedure was faulty.
Rationale: This method can create smaller PDFs, advantageous for customers that scan and archive numerical documents.
Tech Problem: Xerox's template matching procedure was not reliable, sometimes "papering over" a digit with the wrong digit!
PR Problem: Xerox press releases initially claimed this issue did not happen in the factory default mode. Kriesel demonstrated this was not true, by replicating the issue in all of the factory default compression modes including the "normal" mode. He gave a 2015 FrOSCon talk, "Lies, damned lies and scans".
Exactly, in practice the alternatives are either blocky artifacts (JPEG and most other traditional codecs), blurring everything (learned codecs optimised for MSE) or "hallucinating" patterns when using models like GANs. However, in practice even the generative side of compression models is evaluated against the original image rather than only output quality, so the outputs tend to be passable.
To see what a lossy generator hallucinating patterns means in practice, I recommend viewing HiFiC vs original here: https://hific.github.io/
Tradtional lossy compressors have well-understood artifacts. In particular they provide guarantees such that you can confidently say that an object in the image could not be an artifact.