Mistral AI has launched Mistral OCR 3, its newest optical character recognition service that powers the corporate’s Doc AI stack. The mannequin, named as mistral-ocr-2512, is constructed to extract interleaved textual content and pictures from PDFs and different paperwork whereas preserving construction, and it does this at an aggressive worth of $2 per 1,000 pages with a 50% low cost when used by means of the Batch API.
What Mistral OCR 3 is Optimized for?
Mistral OCR 3 targets typical enterprise doc workloads. The mannequin is tuned for types, scanned paperwork, advanced tables, and handwriting. It’s evaluated on inside benchmarks drawn from actual enterprise use instances, the place it achieves a 74% total win charge over Mistral OCR 2 throughout these doc classes utilizing a fuzzy match metric towards floor reality.
The mannequin outputs markdown that preserves doc format, and when desk formatting is enabled, it enriches the output with HTML primarily based desk representations. This mix offers downstream programs each the content material and the structural info that’s wanted for retrieval pipelines, analytics, and agent workflows.
Position in Mistral Doc AI
OCR 3 sits inside Mistral Doc AI, the corporate’s doc processing functionality that mixes OCR with structured information extraction and Doc QnA.
It now powers the Doc AI Playground in Mistral AI Studio. On this interface, customers add PDFs or photographs and get again both clear textual content or structured JSON with out writing code. The identical underlying OCR pipeline is accessible through the general public API, which permits groups to maneuver from interactive exploration to manufacturing workloads with out altering the core mannequin.
Inputs, Outputs, And Construction
The OCR processor accepts a number of doc codecs by means of a single API. The doc subject can level to:
document_urlfor PDFs, pptx, docx and extraimage_urlfor picture varieties reminiscent of png, jpeg or avif- Uploaded or base64 encoded PDFs or photographs by means of the identical schema
That is documented within the OCR Processor part of Mistral’s Doc AI docs.
The response is a JSON object with a pages array. Every web page comprises an index, a markdown string, an inventory of photographs, an inventory of tables when table_format="html" is used, detected hyperlinks, elective header and footer fields when header or footer extraction is enabled, and a dimensions object with web page measurement. There may be additionally a document_annotation subject for structured annotations and a usage_info block for accounting info.
When photographs and HTML tables are extracted, the markdown consists of placeholders reminiscent of  and [tbl-3.html](tbl-3.html). These placeholders are mapped again to precise content material utilizing the photographs and tables arrays within the response, which simplifies downstream reconstruction.
Upgrades Over Mistral OCR 2
Mistral OCR 3 introduces a number of concrete upgrades relative to OCR 2. The general public launch notes emphasize 4 predominant areas.
- Handwriting Mistral OCR 3 extra precisely interprets cursive, combined content material annotations, and handwritten textual content positioned on prime of printed templates.
- Types It improves detection of bins, labels, and handwritten entries in dense layouts reminiscent of invoices, receipts, compliance types, and authorities paperwork.
- Scanned and complicated paperwork The mannequin is extra strong to compression artifacts, skew, distortion, low DPI, and background noise in scanned pages.
- Complicated tables It reconstructs desk constructions with headers, merged cells, multi row blocks, and column hierarchies, and it may possibly return HTML tables with correct
colspanandrowspantags in order that format is preserved.


Pricing, Batch Inference, And Annotations
The OCR 3 mannequin card lists pricing at $2 per 1,000 pages for traditional OCR and $3 per 1,000 annotated pages when structured annotations are used.
Mistral additionally exposes OCR 3 by means of its Batch Inference API /v1/batch, which is documented below the batching part of the platform. Batch processing halves the efficient OCR worth to $1 per 1,000 pages by making use of a 50% low cost for jobs that run by means of the batch pipeline.
The mannequin integrates with two necessary options on the identical endpoint, Annotations – Structured and BBox Extraction. These permit builders to connect schema pushed labels to areas of a doc and get bounding bins for textual content and different parts, which is helpful when mapping content material into downstream programs or UI overlays.
Key Takeaways
- Mannequin and function: Mistral OCR 3, named as
mistral-ocr-2512, is the brand new OCR service that powers Mistral’s Doc AI stack for web page primarily based doc understanding. - Accuracy positive factors: On inside benchmarks overlaying types, scanned paperwork, advanced tables, and handwriting, OCR 3 achieves a 74% total win charge over Mistral OCR 2, and Mistral positions it as cutting-edge towards each conventional and AI native OCR programs.
- Structured outputs for RAG: The service extracts interleaved textual content and embedded photographs and returns markdown enriched with HTML reconstructed tables, preserving format and desk construction so outputs can feed immediately into RAG, brokers, and search pipelines with minimal additional parsing.
- API and doc codecs: Builders entry OCR 3 through the
/v1/ocrendpoint or SDK, passing PDFs asdocument_urland pictures reminiscent of png or jpeg asimage_url, and may allow choices like HTML desk output, header or footer extraction, and base64 photographs within the response. - Pricing and batch processing: OCR 3 is priced at 2 {dollars} per 1,000 pages and three {dollars} per 1,000 annotated pages, and when used by means of the Batch API the efficient worth for traditional OCR drops to 1 greenback per 1,000 pages for giant scale processing.
Try the TECHNICAL DETAILS. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Publication.

