FIND-20260324-021 · 2026-03-24 · Innovation Veille

Comparing Top Open-Source OCR Solutions — Chandra wins on documents, tables and formulas

adhoc HIGH
Avi Chawla (@_avichawla) published a benchmark comparing five leading open-source OCR solutions: DeepSeek OCR, Datalab Chandra, Qwen3-VL, Dots OCR, and Granite Docling. Winner: Datalab Chandra — supports 40+ languages, handles text, tables, and formulas. Evaluation was done fully locally using DeepEval as the framework, no cloud dependencies. All benchmark code is available on GitHub and Lightning AI. The comparison is directly relevant to ODS PDF Engine and DocStore services which require reliable document parsing.

Source

https://blog.dailydoseofds.com/p/comparing-the-top-open-source-ocr

ODS Impact

High relevance for PDF Engine (~/dev/specs/ods-platform/specs/pdf-engine/spec.md) and DocStore. ODS needs OCR for: PDF text extraction, form field recognition in Form Engine, document parsing in DocSign desktop. Chandra's strength on tables and formulas is especially valuable for KEBA/CLM contract documents. Granite Docling (IBM) deserves evaluation for its Apache-2.0 license and enterprise-grade trust. The fully-local deployment model aligns with ODS data-sovereignty requirements (no cloud data leakage for tenant documents).

Security Review

License: Apache-2.0 (Datalab Chandra, Granite Docling) / various | Maintenance: ACTIVE | Risk: LOW | Recommendation: SAFE_TO_USE

Tags

ocr pdf-engine docstore document-processing ai-ml open-source local-inference