FIND-20260324-021 · 2026-03-24 · Innovation Veille
Comparing Top Open-Source OCR Solutions — Chandra wins on documents, tables and formulas
adhoc
HIGH
Avi Chawla (@_avichawla) published a benchmark comparing five leading open-source OCR solutions: DeepSeek OCR, Datalab Chandra, Qwen3-VL, Dots OCR, and Granite Docling. Winner: Datalab Chandra — supports 40+ languages, handles text, tables, and formulas. Evaluation was done fully locally using DeepEval as the framework, no cloud dependencies. All benchmark code is available on GitHub and Lightning AI. The comparison is directly relevant to ODS PDF Engine and DocStore services which require reliable document parsing.
Source
https://blog.dailydoseofds.com/p/comparing-the-top-open-source-ocr
ODS Impact
High relevance for PDF Engine (~/dev/specs/ods-platform/specs/pdf-engine/spec.md) and DocStore. ODS needs OCR for: PDF text extraction, form field recognition in Form Engine, document parsing in DocSign desktop. Chandra's strength on tables and formulas is especially valuable for KEBA/CLM contract documents. Granite Docling (IBM) deserves evaluation for its Apache-2.0 license and enterprise-grade trust. The fully-local deployment model aligns with ODS data-sovereignty requirements (no cloud data leakage for tenant documents).
Security Review
License: Apache-2.0 (Datalab Chandra, Granite Docling) / various | Maintenance: ACTIVE | Risk: LOW | Recommendation: SAFE_TO_USE
Tags
ocr
pdf-engine
docstore
document-processing
ai-ml
open-source
local-inference