Generative OCR optimization

After partitioning, you can have a vision language model (VLM) optimize the fidelity of text blocks that Unstructured initially processed during its partitioning phase. Here are a few examples of Unstructured’s output of text blocks that were initially processed, and the more accurate version of these text blocks that were optimized by using Claude Sonnet 4. Irrelevant lines of output have been omitted here for brevity. Example 1: Vertical watermarked text

Before (vertical watermarked text, represented incorrectly):

{
    "...", "...",
    "text": "3 2 0 2 t c O 9 2 ] V C . s c [ 2 v 9 0 8 6 1 . 0 1 3 2 : v i X r",
    "...", "..."
}

After (vertical watermarked text, now represented correctly from the original content):

{
    "...", "...",
    "text": "arXiv:2310.16809v2 [cs.CV] 29 Oct 2023",
    "...", "..."
}

Example 2: Hyperlink

Before (hyperlink, represented incorrectly):

{
    "...", "...",
    "text": "con/Yuliang-Liu/MultinodalOCR|",
    "...", "..."
}

After (hyperlink, now represented correctly from the original content):

{
    "...", "...",
    "text": "https://github.com/Yuliang-Liu/MultimodalOCR",
    "...", "..."
}

Example 3: Chinese characters

Before (Chinese characters, represented incorrectly):

{
    "...", "...",
    "text": "GT SHE GPT4-V: EHES",
    "...", "..."
}

After (Chinese characters, now represented correctly from the original content, expressed as Unicode):

{
    "...", "...",
    "text": "GT : \u91d1\u724c\u70e7\u814a GPT4-V: \u6587\u9759\u5019\u9e1f",
    "...", "..."
}

Improve text fidelity with generative OCR

To produce generative OCR optimizations, in an Enrichment node in a workflow, click the following in the node’s settings pane’s Details tab:

Image under Input Type.
One of the following providers and models:
- Anthropic under Provider and any choice under Model
- OpenAI under Provider and any choice under Model
Generative OCR under Task.

The Generative OCR enrichment appears under the Input Type of Image, even though this is not an image-related enrichment. This is a known issue and will be addressed in a future release.

You can change a workflow’s image description settings only through Custom workflow settings.For workflows that use chunking, the Chunker node should be placed after all Enrichment nodes. Placing the Chunker node before an image descriptions Enrichment node could cause incomplete or no image descriptions to be generated.

Unstructured can produce generative OCR optimizations for workflows that are configured as follows:

With a Partitioner node set to use the Auto or High Res partitioning strategy, and a generative OCR optimizations node is added.
With a Partitioner node set to use the VLM partitioning strategy. No generative OCR optimization node is needed (or allowed).

Unstructured never produces generative OCR optimizations for workflows with a Partitioner node set to use the Fast partitioning strategy.

Unstructured UI

Getting started with the UI

Using the UI

Concepts

Generative OCR optimization

Improve text fidelity with generative OCR

Unstructured UI

Getting started with the UI

Using the UI

Concepts

​Improve text fidelity with generative OCR

Improve text fidelity with generative OCR