OCR Benchmark Leaderboard 2026 - Best Models and Workflow Fit

Download printable cheat-sheet (CC-BY 4.0)

13 Feb 2026, 00:00 Z

This OCR benchmark leaderboard answers a practical shortlist question: which OCR models deserve your first test in 2026, and what should you check before trusting a public leaderboard.

If you searched for OCR benchmark 2026, OCR leaderboard, or best OCR models 2026, start here. This page is the market map. The workflow page turns that shortlist into a deployment route, and the scan-heavy guide shows the page-type evidence behind the routing calls.

By February 2026, open OCR had become crowded enough that benchmark headlines were no longer enough on their own. Several compact vision-language models could already parse documents well. The harder question became where each one breaks.

If you already know you need a page-level routing answer, go straight to the workflow-fit guide: https://instavar.com/blog/ai-production-stack/Which_OCR_Model_Fits_Which_Workflow_in_2026.

For model-vs-model comparisons, use the merged workflow guide for GLM OCR vs PaddleOCR, GLM OCR vs Mistral OCR, and dots.ocr-1.5 routing decisions. The old dots comparison URL redirects there because the durable comparison logic now lives in one canonical OCR guide.

What we found

The top reported OCR models are now close enough on headline benchmarks that production fit matters more than tiny score gaps.
GLM-OCR and PaddleOCR-VL-1.5 still belong in the reported OmniDocBench shortlist.
Our hands-on read is more practical: Hunyuan is strongest when coordinates matter, DeepSeek helps when blank-page handling matters, FireRed is the best balanced operational choice, and GLM remains the fastest normal-case workflow.
dots.ocr-1.5 belongs in the OCR plus broader visual parsing lane, not as the default scanned-PDF model.
Use this page to build the first shortlist, then run a fixed page-type bake-off before rollout.

Update (Mar 2026):
The public shortlist should now be read with a second layer in mind: our newer full-50 workflow benchmark across Hunyuan, DeepSeek, GLM, and FireRed.
That benchmark does not replace the public leaderboard tables below, but it does change the deployment readout: Hunyuan leads on grounded output, DeepSeek is now the second grounded workflow and the strongest blank-page detector, FireRed remains the best balanced workflow, and GLM remains the fastest normal-case path.
For the practical routing answer across those workflows plus

Your priority	Recommended first model to test	Why	Second model to test
Highest headline benchmark performance	GLM-OCR	Top reported OmniDocBench score among current open releases	PaddleOCR-VL-1.5
Highest grounded workflow output	HunyuanOCR	Strongest grounded workflow in the current full-50 hands-on run	DeepSeek-OCR-2
Best blank-page handling in the current hands-on workflow benchmark	DeepSeek-OCR-2	Only workflow in the four-model full-50 run to detect all `3/3` blank pages	HunyuanOCR
OCR plus broader image parsing (SVG, web, scene text)	dots.ocr-1.5	Extends beyond document parsing into multi-task vision-language parsing	GLM-OCR
Structure-sensitive Markdown OCR with formulas and table closure risks	FireRed-OCR	Public training story explicitly targets structural hallucination and syntax validity	GLM-OCR
Best speed/quality for heavy page volume	LightOnOCR-2	Strong reported OlmOCR-Bench + throughput profile	GLM-OCR

Model	Params	OmniDocBench (reported)	Notes	Source
GLM-OCR	0.9B	94.62	Strong all-round reported score; very recent release	GLM-OCR repo
PaddleOCR-VL-1.5	0.9B	94.50	Competitive accuracy with compact footprint	PaddleOCR-VL paper
HunyuanOCR	1B	94.10	High reported score and broad task framing	HunyuanOCR report
DeepSeek-OCR-2	3B MoE decoder (~500M active) + 80M image compressor	91.09	Notable jump over earlier DeepSeek OCR baseline in authors' report	DeepSeek-OCR-2 paper
FireRed-OCR	2B	92.94	Official March 2026 release frames it as the strongest end-to-end solution in its comparison slice; structural Markdown focus is the main differentiator	FireRed-OCR paper

Signal	Reported value	Source
OmniDocBench v1.5 overall	`92.94`	FireRed-OCR paper
FireRedBench overall	`74.62`	FireRed-OCR repo
Base foundation	`Qwen3-VL-2B-Instruct`	FireRed-OCR repo
Distinguishing method	Format-Constrained GRPO for formula syntax, table integrity, hierarchical closure, and text accuracy	FireRed-OCR repo

Workflow	Mean sec/page	Blank pages detected	Total visual anchors	Practical readout
`FireRed`	`3.328`	`2/3`	`48`	Best balanced workflow
`GLM`	`1.252`	`0/3`	`57`	Fastest normal-case workflow
`Hunyuan`	`6.884`	`2/3`	`1517`	Strongest grounded workflow
`DeepSeek`	`17.591`	`3/3`	`926`	Second grounded workflow; strongest blank handling

Model	OlmOCR-Bench (reported)	Context	Source
LightOnOCR-2-1B	83.2 +/- 0.9	Reported as strongest in this comparison slice	LightOnOCR-2 paper
Chandra-9B	81.7 +/- 0.9	Large model baseline in same evaluation	LightOnOCR-2 paper
olmOCR-2-8B	80.4 +/- 1.1	Strong open baseline with robust ecosystem support	LightOnOCR-2 paper

OCR Benchmark Leaderboard 2026 - Best Models and Workflow Fit

What we found

Turn AI video into a repeatable engine

How to use this page

1 Start here: which models belong in your first shortlist?

2 What changed in OCR by February 2026

3 What the benchmark evidence says (reported)

3.0 Why one OCR leaderboard score is not enough

3.1 OmniDocBench snapshot

3.5 FireRed-OCR early evidence snapshot

3.6 What hands-on evaluation changed

3.7 What the newer four-model workflow benchmark changed

3.2 OlmOCR-Bench snapshot

3.3 Throughput snapshot

3.4 dots.ocr-1.5 early evidence snapshot (author-reported)

4 Model fit by use case

4.1 Use-case fit matrix

4.2 Adoption and maturity signals (Feb 13, 2026 snapshot)

5 A practical evaluation protocol (6 core models + challengers)

5.1 Preflight gates (before benchmarking)

5.2 50-page stratified set

5.3 Ground-truth package per page

5.4 Inference protocol (same policy for all models)

5.5 Metrics

5.6 Pass/fail thresholds

5.7 Weighted winner rule

5.8 Deployment outcome template

6 Risks and caveats before production rollout

7 Conclusion

Sources

Related Posts

Wan 2.2 + Spline Path Control v2 - The Perfect Match for Precision AI Video Generation

Model	Throughput (pages/s, reported)	Hardware context	Source
LightOnOCR-2-1B	5.71	Single H100 context in authors' report	LightOnOCR-2 paper
DeepSeek-OCR family	Varies by mode and output format	Public demos emphasize extraction-mode trade-offs	DeepSeek-OCR-2
GLM-OCR	Deployment-oriented serving options published; no single canonical throughput figure in repo	Depends on serving stack (`vLLM`, `SGLang`, Ollama)	GLM-OCR repo

Model	Choose first when	Why it wins there	Watch-outs
HunyuanOCR	You need dense grounded output for extraction or audit-heavy workflows	Strongest grounded workflow in the current full-50 hands-on benchmark	Slower than GLM or FireRed; raw output usually needs more normalization
DeepSeek-OCR-2	You need stronger grounding than GLM or FireRed plus strict blank-page handling	Second grounded workflow in the current hands-on benchmark and the only one to detect `3/3` blank pages	Slowest current workflow; current helper job adds startup overhead
GLM-OCR	You need a strong default baseline across mixed documents	Top-tier reported OmniDocBench result in compact size; multiple serving paths	Very new release; long-tail behavior still needs broad replication
dots.ocr-1.5	You need one model for OCR plus web/screen/scene/SVG parsing	Broad task coverage in a single 3B model family and strong reported release benchmarks	Many benchmark claims are currently model-card/repo reported for this version
FireRed-OCR	You need stricter structural Markdown behavior with formulas and tables	Public training story explicitly targets structural hallucination and syntactic validity	Early-cycle release; benchmark evidence is still author-reported and needs broad replication
DeepSeek-OCR-2	You need markdown-oriented output and mode switching	Reading-order-focused design and dual extraction modes (`Free OCR` and structured conversion)	Validate complex tables and multilingual edge cases on your own corpus
LightOnOCR-2-1B	You process high page volume and care about cost per page	Strong reported OlmOCR-Bench + throughput profile at 1B scale	Check performance on your language/script distribution
GutenOCR	You need reliable text-to-location grounding for downstream extraction	Grounded OCR is core design objective and first-class output	Weight license is CC-BY-NC; commercial use may be constrained
HunyuanOCR	You want one compact model for broad document tasks	Strong reported compact-model results across parsing-oriented tasks	Custom community license requires legal/compliance review
PaddleOCR-VL-1.5	Your inputs are messy scans/photos and you already run Paddle tooling	Near-frontier reported OmniDocBench score with robustness framing	Confirm accuracy on your distortion mix and template families

Model	Maturity signal	What it means for rollout
GLM-OCR	Rapid early GitHub/HF uptake after launch	Fast-moving ecosystem, but still early for stability assumptions
dots.ocr-1.5	Fresh Feb 16, 2026 release with expanded task scope	High upside for multi-task use cases, but treat current results as early-cycle evidence
FireRed-OCR	March 2026 release with repo, model card, and paper all live at launch	Stronger evidence package than many brand-new challengers, but still early for stability assumptions
DeepSeek-OCR-2	Strong HF traction soon after release	Good community momentum for tooling and examples
HunyuanOCR	High visibility and broad activity across channels	More examples in the wild for compact deployment patterns
GutenOCR	Growing technical interest from doc-AI builders	Strong relevance for grounding-heavy extraction workflows
LightOnOCR-2-1B	Attention driven by 1B speed/quality profile	Good candidate for throughput-first deployments
PaddleOCR-VL-1.5	Benchmark-competitive and aligned with Paddle stack users	Lower integration risk if your team already uses Paddle

Slice	Pages	Why this slice matters
Clean digital single-column PDFs	8	Baseline text fidelity
Multi-column + sidebars + footnotes	8	Reading-order stress
Table-heavy documents	8	Structure fidelity and cell ordering
Formula-heavy documents	6	Formula extraction and sequencing
Forms/invoices/receipts	6	Region association and key-value linking
Messy photos/scans	10	Skew, warping, glare, and capture artifacts
Multilingual mixed-script pages	4	Language/layout stability

Signal	Reported value	Source
Elo on olmOCR-Bench	1089.0	dots.ocr release repo
Elo on OmniDocBench (v1.5)	1025.8	dots.ocr release repo
Elo on XDocParse	1157.1	dots.ocr release repo
OmniDocBench (v1.5) TextEdit	0.031 (lower is better)	dots.ocr release repo
OmniDocBench (v1.5) ReadOrderEdit	0.029 (lower is better)	dots.ocr release repo

What we found

Turn AI video into a repeatable engine

How to use this page

1 Start here: which models belong in your first shortlist?

2 What changed in OCR by February 2026

3 What the benchmark evidence says (reported)

3.0 Why one OCR leaderboard score is not enough

3.1 OmniDocBench snapshot

3.5 FireRed-OCR early evidence snapshot

3.6 What hands-on evaluation changed

3.7 What the newer four-model workflow benchmark changed

3.2 OlmOCR-Bench snapshot

3.3 Throughput snapshot

3.4 dots.ocr-1.5 early evidence snapshot (author-reported)

4 Model fit by use case

4.1 Use-case fit matrix

4.2 Adoption and maturity signals (Feb 13, 2026 snapshot)

5 A practical evaluation protocol (6 core models + challengers)

5.1 Preflight gates (before benchmarking)

5.2 50-page stratified set

5.3 Ground-truth package per page

5.4 Inference protocol (same policy for all models)

5.5 Metrics

5.6 Pass/fail thresholds

5.7 Weighted winner rule

5.8 Deployment outcome template

6 Risks and caveats before production rollout

7 Conclusion

Sources

Related Posts

The Agent-to-Agent Internet - Evaluation Arenas, Algorithmic Governance, and the Dark Web of AI

HunyuanPortrait - Revolutionizing Social Media Hooks with AI Portrait Animation

Wan 2.2 + Spline Path Control v2 - The Perfect Match for Precision AI Video Generation