I think you're underselling ocrmypdf, which I use heavily:
1. Scan-only PDFs are shockingly common: I download papers all the time which are scan-only.
2. Non-scan-only PDFs often have garbage OCR. I assume this is because they were done long ago and never redone since. (Tesseract has gotten a lot better over the years.) Not terribly rarely, I use ocrmypdf to forcibly redo OCR layers because they are unusable.
3. ocrmypdf supports JBIG2, and possibly for other reasons as well, generates smaller PDFs; this is true even for 'native' PDFs or ones with good OCR. I routinely see PDFs I download, hot off the presses just days before with presumably their latest and greatest publishing stack from major scientific publishers, which shrink by a third or a half or sometimes wind up as much as 10x smaller. Not being a PDF expert, I have no idea how they manage to waste so much space, but they manage it. I also found that standard scan tools I was using like imagescan or gscan2pdf or 1dollarscan were not producing as small PDFs as ocrmypdf did.
4. ocrmypdf will also write PDF/A by default. It's true that most PDFs you download or create will probably be perfectly readable 50 years from now with no special effort... But it's nice to have that extra bit of archival compliance.
I agree on all points. I use the following one-liner in directories of PDFs to reduce their file size while retaining dimensions, not hurting readability, and keeping the embedded OCR text in place. It skips re-running the OCR. It's basically a recipe from the docs, I believe.
1. Scan-only PDFs are shockingly common: I download papers all the time which are scan-only. 2. Non-scan-only PDFs often have garbage OCR. I assume this is because they were done long ago and never redone since. (Tesseract has gotten a lot better over the years.) Not terribly rarely, I use ocrmypdf to forcibly redo OCR layers because they are unusable. 3. ocrmypdf supports JBIG2, and possibly for other reasons as well, generates smaller PDFs; this is true even for 'native' PDFs or ones with good OCR. I routinely see PDFs I download, hot off the presses just days before with presumably their latest and greatest publishing stack from major scientific publishers, which shrink by a third or a half or sometimes wind up as much as 10x smaller. Not being a PDF expert, I have no idea how they manage to waste so much space, but they manage it. I also found that standard scan tools I was using like imagescan or gscan2pdf or 1dollarscan were not producing as small PDFs as ocrmypdf did. 4. ocrmypdf will also write PDF/A by default. It's true that most PDFs you download or create will probably be perfectly readable 50 years from now with no special effort... But it's nice to have that extra bit of archival compliance.