How We Benchmark OCR Models on Scan-Heavy PDFs

Download printable cheat-sheet (CC-BY 4.0)

12 Mar 2026, 00:00 Z

This post answers a methodology question: how do you benchmark OCR on scan-heavy PDFs without fooling yourself with one headline score.

Use this article for the method, the evidence, and the limits. If you only need the current deployment answer, use the workflow-fit guide: https://instavar.com/blog/ai-production-stack/Which_OCR_Model_Fits_Which_Workflow_in_2026.

The short version

  • We ran an original 31-PDF, 1331-page scan-heavy pilot across GLM-OCR, dots.ocr-1.5, MonkeyOCR, PaddleOCR PP-StructureV3, and FireRed-OCR.
  • The important result was not a universal winner. It was a routing rule by page type.
  • The result changed when the FireRed-OCR wrapper was patched to handle near-blank pages and preserve page images.
  • The later public OCR story moved on to include Hunyuan and DeepSeek; this post remains the method and evidence record for the original pilot.
Update (Mar 2026):
The newer workflow-boundary comparison now includes Hunyuan and DeepSeek alongside GLM and FireRed.
At that newer workflow boundary, Hunyuan is the strongest grounded workflow, DeepSeek is the second grounded workflow and the strongest blank-page detector, FireRed remains the best balanced workflow, and GLM remains the fastest typical workflow.
Use the updated market-map and workflow-fit posts for the current shortlist and deployment answer: https://instavar.com/blog/ai-production-stack/OCR_SOTA_Feb_2026_Open_Document_AI_Leaderboard
https://instavar.com/blog/ai-production-stack/Which_OCR_Model_Fits_Which_Workflow_in_2026

Trust basis: this was run self-hosted on a single RTX 3090 Ti 24 GB box, the raw outputs were kept, the harness was versioned as it changed, and every page in the corpus was reviewed at contact-sheet scale before the disputed pages were checked again at higher zoom.

Quick definitions:

  • Corpus means the set of PDFs and pages we tested.
  • Artifact score means a cleanup-cost score for obvious OCR damage, such as repeated lines or spaced-out words.
  • Visual audit means a human checked page images next to OCR output, instead of trusting the metric alone.

AI video production

Turn AI video into a repeatable engine

Build an AI-assisted video pipeline with hook-first scripts, brand-safe edits, and multi-platform delivery.