OCR Labeling for PDFs

Use this template to perform region-level OCR directly on native PDFs.

The Pdf tag renders multi-page documents (up to 100 pages) with zoom and rotation, while OcrLabels lets you draw bounding boxes, assign labels, and capture editable OCR text per region.

Each region stores normalized coordinates, rotation, and a page index, making outputs reliable for downstream extraction tasks.

Ideal for document intelligence, QA on OCR output, and structured data capture workflows.

Enterprise

This template can only be used with in Label Studio Enterprise.

Screenshot

Labeling Configuration

<View>
  <Header value="Select text to correct" size="4"/>
  <OcrLabels name="ocr" toName="pdf">
    <Label value="Typo" />
    <Label value="Incorrect Amount" />
    <Label value="Incorrect Name" />
  </OcrLabels>
  <Pdf name="pdf" value="$pdf"/>
</View>

<!-- {
  "data": {
    "pdf": "/static/samples/opossum-cuteness.pdf"
  }
} -->

About the labeling configuration

Pdf

This will display your PDF natively in Label Studio, allowing you to zoom in and rotate as needed.

Support for PDFs up to 100 pages.
OcrLabels

Used only with the Pdf tag, and allows you to draw bounding boxes around text. Note that the PDF must have a text overlay for this to work (for example, verify whether you can highlight text in the PDF using your cursor).

Select the text under the Regions panel to correct it.

Input data

{
    "data": {
      "pdf": "/static/samples/opossum-cuteness.pdf"
    }
}

Header
Pdf

Share Your Label Studio Configs!

Inspire the community by sharing your unique Label Studio configurations in the Awesome Label Studio Configs repository!

Contribute now!

OCR Labeling for PDFs

Labeling Configuration

About the labeling configuration

Input data

Related tags

In this article

Share Your Label Studio Configs!