Minimal vlm pipeline
Minimal VLM pipeline example: convert a PDF using a vision-language model.
What this example does - Runs the VLM-powered pipeline on a PDF (by URL) and prints Markdown output. - Shows three setups: default (no config), using presets, and runtime overrides. - Demonstrates both the simplest approach and the NEW preset-based system.
Prerequisites - Install Docling with VLM extras and the appropriate backend (Transformers or MLX). - Ensure your environment can download model weights (e.g., from Hugging Face).
How to run
- From the repository root, run: python docs/examples/minimal_vlm_pipeline.py.
- The script prints the converted Markdown to stdout.
Notes
- source may be a local path or a URL to a PDF.
- For the LEGACY approach (backward compatibility), see docs/examples/minimal_vlm_pipeline_legacy.py.
- For more preset examples and runtime options, see docs/examples/vlm_presets_and_runtimes.py.
import platform
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import (
VlmConvertOptions,
VlmPipelineOptions,
)
from docling.datamodel.vlm_engine_options import (
MlxVlmEngineOptions,
TransformersVlmEngineOptions,
)
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.vlm_pipeline import VlmPipeline
# Convert a public arXiv PDF; replace with a local path if preferred.
source = "https://arxiv.org/pdf/2501.17887"
###### EXAMPLE 1: USING DEFAULT SETTINGS (SIMPLEST)
# - No configuration needed
# - Uses default VLM model (GraniteDocling)
# - Auto-selects the best runtime for your platform
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_cls=VlmPipeline,
),
}
)
doc = converter.convert(source=source).document
print(doc.export_to_markdown())
###### EXAMPLE 2: USING PRESETS (RECOMMENDED)
# - Uses the "granite_docling" preset explicitly
# - Same as default but more explicit and configurable
# - Auto-selects the best runtime for your platform (Transformers by default)
vlm_options = VlmConvertOptions.from_preset("granite_docling")
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_cls=VlmPipeline,
pipeline_options=VlmPipelineOptions(vlm_options=vlm_options),
),
}
)
doc = converter.convert(source=source).document
print(doc.export_to_markdown())
###### EXAMPLE 3: USING PRESETS WITH RUNTIME OVERRIDE (ADVANCED)
# Demonstrates using the same preset but overriding the runtime explicitly.
# MLX is Apple Silicon only, so keep the example portable by using MLX on
# macOS/arm64 and Transformers everywhere else, including Linux CI.
engine_options = (
MlxVlmEngineOptions()
if platform.system() == "Darwin" and platform.machine() == "arm64"
else TransformersVlmEngineOptions()
)
vlm_options = VlmConvertOptions.from_preset(
"granite_docling",
engine_options=engine_options,
)
# The preset automatically selects the model variant matching the runtime.
print(
"Using model: "
f"{vlm_options.model_spec.get_repo_id(vlm_options.engine_options.engine_type)}"
)
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_cls=VlmPipeline,
pipeline_options=VlmPipelineOptions(vlm_options=vlm_options),
),
}
)
doc = converter.convert(source=source).document
print(doc.export_to_markdown())