I think you're underselling ocrmypdf, which I use heavily: 1. Scan-only PDFs are...

gumboshoes · on July 9, 2022

I agree on all points. I use the following one-liner in directories of PDFs to reduce their file size while retaining dimensions, not hurting readability, and keeping the embedded OCR text in place. It skips re-running the OCR. It's basically a recipe from the docs, I believe.

find . -name '*.pdf' | parallel --tag -j 1 ocrmypdf --tesseract-timeout=0 --skip-text --jbig2-lossy --optimize 3 --output-type pdf '{}' '{}-sm.pdf'