Skip to content

Docling Pipelines Reference

Docling has two pipeline families for PDFs: standard (parse + OCR + layout/tables) and VLM (page images through a vision-language model). The docling CLI exposes both via --pipeline standard (default) and --pipeline vlm. The right choice depends on document type, hardware, and latency budget.


Decision matrix

Document type Recommended pipeline Reason
Born-digital PDF (text selectable) Standard Fast, accurate, no GPU needed
Scanned PDF / image-only Standard + OCR or VLM Depends on quality
Complex layout (multi-column, dense tables) VLM Better structural understanding
Handwriting, formulas, figures with embedded text VLM Only viable option
Air-gapped / no GPU Standard Runs on CPU
Production scale, GPU server available VLM (vLLM) Best throughput
Apple Silicon / local dev VLM (MLX) MPS acceleration
Speed-critical, accuracy secondary Standard, no tables Fastest path

Pipeline 1: Standard PDF Pipeline

Uses deterministic PDF parsing (docling-parse) + optional neural OCR + neural table structure detection.

CLI usage

# Default (standard pipeline, OCR + tables enabled)
docling report.pdf --output /tmp/

# Custom OCR engine
docling report.pdf --ocr-engine tesserocr --output /tmp/

# Disable OCR or tables
docling report.pdf --no-ocr --output /tmp/
docling report.pdf --no-tables --output /tmp/

Python API

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions

# Minimal — library defaults (standard PDF pipeline)
converter = DocumentConverter()

# Explicit PdfPipelineOptions (docling 2.81+): use InputFormat.PDF + PdfFormatOption.
# Do not use format_options={"pdf": opts}; that raises AttributeError on pipeline options.
opts = PdfPipelineOptions(
    do_ocr=True,                 # False = skip OCR entirely
    do_table_structure=True,     # False = skip table detection (faster)
)
converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=opts),
    }
)

OCR engine options

All engines are plug-and-play via the CLI --ocr-engine flag or the Python ocr_options parameter. Default is EasyOCR.

CLI flags

Engine CLI flag Notes
EasyOCR --ocr-engine easyocr (default) No extra pip beyond docling defaults
RapidOCR --ocr-engine rapidocr Lightweight; see Docling notes on read-only FS
Tesseract (Python) --ocr-engine tesserocr Needs pip install tesserocr and system Tesseract
Tesseract (CLI) --ocr-engine tesseract Shells out to tesseract binary
macOS Vision --ocr-engine ocrmac macOS only

Python API

# EasyOCR (default — no extra install needed)
from docling.datamodel.pipeline_options import PdfPipelineOptions
opts = PdfPipelineOptions(do_ocr=True)  # uses EasyOCR by default

# Tesseract (requires system Tesseract + pip install tesserocr — see Docling install docs)
from docling.datamodel.pipeline_options import TesseractOcrOptions
opts = PdfPipelineOptions(do_ocr=True, ocr_options=TesseractOcrOptions())

# RapidOCR (lightweight, no C deps)
from docling.datamodel.pipeline_options import RapidOcrOptions
opts = PdfPipelineOptions(do_ocr=True, ocr_options=RapidOcrOptions())

# macOS native OCR
from docling.datamodel.pipeline_options import OcrMacOptions
opts = PdfPipelineOptions(do_ocr=True, ocr_options=OcrMacOptions())

Pipeline 2: VLM Pipeline — local inference

Processes each page as an image through a vision-language model. Replaces the standard layout detection + OCR stack entirely.

CLI usage

# Default VLM model (granite_docling)
docling report.pdf --pipeline vlm --output /tmp/

# Specific model
docling report.pdf --pipeline vlm --vlm-model smoldocling --output /tmp/

Python API

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import VlmPipelineOptions
from docling.datamodel import vlm_model_specs
from docling.pipeline.vlm_pipeline import VlmPipeline

pipeline_options = VlmPipelineOptions(
    vlm_options=vlm_model_specs.GRANITEDOCLING_TRANSFORMERS,
    generate_page_images=True,
)

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_cls=VlmPipeline,
            pipeline_options=pipeline_options,
        )
    }
)

Available model presets

CLI --vlm-model Python preset (vlm_model_specs) Backend Device Notes
granite_docling GRANITEDOCLING_TRANSFORMERS HF Transformers CPU/GPU Default
smoldocling SMOLDOCLING_TRANSFORMERS HF Transformers CPU/GPU Lighter
(Python API only) GRANITEDOCLING_VLLM vLLM GPU Fast batch
(Python API only) GRANITEDOCLING_MLX MLX Apple MPS M-series Macs

Hybrid mode: PDF text + VLM for images/tables

Set force_backend_text=True (Python API only) to use deterministic text extraction for normal text regions while routing images and tables through the VLM. Reduces hallucination risk on text-heavy pages.

pipeline_options = VlmPipelineOptions(
    vlm_options=vlm_model_specs.GRANITEDOCLING_TRANSFORMERS,
    force_backend_text=True,   # <-- hybrid mode
    generate_page_images=True,
)

Pipeline 3: VLM Pipeline — remote API

Sends page images to any OpenAI-compatible endpoint. Works with vLLM, LM Studio, Ollama, or a hosted model API.

This is available via the CLI with --pipeline vlm --enable-remote-services, but endpoint URL, model name, and API key configuration require the Python API.

CLI usage (basic)

docling report.pdf --pipeline vlm --enable-remote-services --output /tmp/

Python API (full configuration)

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import VlmPipelineOptions
from docling.datamodel.pipeline_options_vlm_model import ApiVlmOptions, ResponseFormat
from docling.pipeline.vlm_pipeline import VlmPipeline

vlm_opts = ApiVlmOptions(
    url="http://localhost:8000/v1/chat/completions",
    params=dict(
        model="ibm-granite/granite-docling-258M",
        max_tokens=4096,
    ),
    headers={"Authorization": "Bearer YOUR_KEY"},  # omit if not needed
    prompt="Convert this page to docling.",
    response_format=ResponseFormat.DOCTAGS,
    timeout=120,
    scale=2.0,
)

pipeline_options = VlmPipelineOptions(
    vlm_options=vlm_opts,
    generate_page_images=True,
    enable_remote_services=True,  # required — gates any HTTP call
)

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_cls=VlmPipeline,
            pipeline_options=pipeline_options,
        )
    }
)

enable_remote_services=True is mandatory for API pipelines. Docling blocks outbound HTTP by default as a safety measure.

Common API targets

Server Default URL Notes
vLLM http://localhost:8000/v1/chat/completions Best throughput
LM Studio http://localhost:1234/v1/chat/completions Local dev
Ollama http://localhost:11434/v1/chat/completions Model: ibm/granite-docling:258m
OpenAI-compatible cloud Provider URL Set Authorization header

VLM install requirements

Local inference requires PyTorch + Transformers:

pip install docling[vlm]
# or manually:
pip install torch transformers accelerate

MLX (Apple Silicon only):

pip install mlx mlx-lm

vLLM backend (server-side):

pip install vllm
vllm serve ibm-granite/granite-docling-258M