NEW Native Support for Conversational Data: 4 Use Cases to Ship Better Chatbots

OCR Labeling for PDFs

Use this template to perform region-level OCR directly on native PDFs.

The Pdf tag renders multi-page documents (up to 100 pages) with zoom and rotation, while OcrLabels lets you draw bounding boxes, assign labels, and capture editable OCR text per region.

Each region stores normalized coordinates, rotation, and a page index, making outputs reliable for downstream extraction tasks.

Ideal for document intelligence, QA on OCR output, and structured data capture workflows.

Enterprise

This template can only be used with in Label Studio Enterprise.

Screenshot

Labeling Configuration

<View>
  <Header value="Select text to correct" size="4"/>
  <OcrLabels name="ocr" toName="pdf">
    <Label value="Typo" />
    <Label value="Incorrect Amount" />
    <Label value="Incorrect Name" />
  </OcrLabels>
  <Pdf name="pdf" value="$pdf"/>
</View>

<!-- {
  "data": {
    "pdf": "/static/samples/opossum-cuteness.pdf"
  }
} -->

About the labeling configuration

  • Pdf

    This will display your PDF natively in Label Studio, allowing you to zoom in and rotate as needed.

    Support for PDFs up to 100 pages.

  • OcrLabels

    Used only with the Pdf tag, and allows you to draw bounding boxes around text. Note that the PDF must have a text overlay for this to work (for example, verify whether you can highlight text in the PDF using your cursor).

    Select the text under the Regions panel to correct it.

Input data

{
    "data": {
      "pdf": "/static/samples/opossum-cuteness.pdf"
    }
}