Skip to content

Pipeline options

Pipeline options allow to customize the execution of the models during the conversion pipeline. This includes options for the OCR engines, the table model as well as enrichment options which can be enabled with do_xyz = True.

This is an automatic generated API reference of the all the pipeline options available in Docling.

pipeline_options

Classes:

Attributes:

granite_picture_description module-attribute

granite_picture_description = PictureDescriptionVlmOptions(repo_id='ibm-granite/granite-vision-3.3-2b', prompt='What is shown in this image?')

Pre-configured Granite Vision model options for picture description.

Uses IBM's Granite Vision 3.3-2B model with a custom prompt for generating detailed descriptions of image content.

smolvlm_picture_description module-attribute

smolvlm_picture_description = PictureDescriptionVlmOptions(repo_id='HuggingFaceTB/SmolVLM-256M-Instruct')

Pre-configured SmolVLM model options for picture description.

Uses the HuggingFace SmolVLM-256M-Instruct model, a lightweight vision-language model optimized for generating natural language descriptions of images.

AsrPipelineOptions pydantic-model

Bases: PipelineOptions

Configuration options for the Automatic Speech Recognition (ASR) pipeline.

This pipeline processes audio files and converts speech to text using Whisper-based models. Supports various audio formats (MP3, WAV, FLAC, etc.) and video files with audio tracks.

Show JSON schema:
{
  "$defs": {
    "AcceleratorDevice": {
      "description": "Devices to run model inference",
      "enum": [
        "auto",
        "cpu",
        "cuda",
        "mps",
        "xpu"
      ],
      "title": "AcceleratorDevice",
      "type": "string"
    },
    "AcceleratorOptions": {
      "additionalProperties": false,
      "description": "Hardware acceleration configuration for model inference.\n\nCan be configured via environment variables with DOCLING_ prefix.",
      "properties": {
        "num_threads": {
          "default": 4,
          "description": "Number of CPU threads to use for model inference. Higher values can improve throughput on multi-core systems but may increase memory usage. Can be set via DOCLING_NUM_THREADS or OMP_NUM_THREADS environment variables. Recommended: number of physical CPU cores.",
          "title": "Num Threads",
          "type": "integer"
        },
        "device": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "$ref": "#/$defs/AcceleratorDevice"
            }
          ],
          "default": "auto",
          "description": "Hardware device for model inference. Options: `auto` (automatic detection), `cpu` (CPU only), `cuda` (NVIDIA GPU), `cuda:N` (specific GPU), `mps` (Apple Silicon), `xpu` (Intel GPU). Auto mode selects the best available device. Can be set via DOCLING_DEVICE environment variable.",
          "title": "Device"
        },
        "cuda_use_flash_attention2": {
          "default": false,
          "description": "Enable Flash Attention 2 optimization for CUDA devices. Provides significant speedup and memory reduction for transformer models on compatible NVIDIA GPUs (Ampere or newer). Requires flash-attn package installation. Can be set via DOCLING_CUDA_USE_FLASH_ATTENTION2 environment variable.",
          "title": "Cuda Use Flash Attention2",
          "type": "boolean"
        }
      },
      "title": "AcceleratorOptions",
      "type": "object"
    },
    "InlineAsrOptions": {
      "description": "Configuration for inline ASR models running locally.",
      "properties": {
        "kind": {
          "const": "inline_model_options",
          "default": "inline_model_options",
          "title": "Kind",
          "type": "string"
        },
        "repo_id": {
          "description": "HuggingFace model repository ID for the ASR model. Must be a Whisper-compatible model for automatic speech recognition.",
          "examples": [
            "openai/whisper-tiny",
            "openai/whisper-base"
          ],
          "title": "Repo Id",
          "type": "string"
        },
        "verbose": {
          "default": false,
          "description": "Enable verbose logging output from the ASR model for debugging purposes.",
          "title": "Verbose",
          "type": "boolean"
        },
        "timestamps": {
          "default": true,
          "description": "Generate timestamps for transcribed segments. When enabled, each transcribed segment includes start and end times for temporal alignment with the audio.",
          "title": "Timestamps",
          "type": "boolean"
        },
        "temperature": {
          "default": 0.0,
          "description": "Sampling temperature for text generation. 0.0 uses greedy decoding (deterministic), higher values (e.g., 0.7-1.0) increase randomness. Recommended: 0.0 for consistent transcriptions.",
          "title": "Temperature",
          "type": "number"
        },
        "max_new_tokens": {
          "default": 256,
          "description": "Maximum number of tokens to generate per transcription segment. Limits output length to prevent runaway generation. Adjust based on expected transcript length.",
          "title": "Max New Tokens",
          "type": "integer"
        },
        "max_time_chunk": {
          "default": 30.0,
          "description": "Maximum duration in seconds for each audio chunk processed by the model. Audio longer than this is split into chunks. Whisper models are typically trained on 30-second segments.",
          "title": "Max Time Chunk",
          "type": "number"
        },
        "torch_dtype": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "PyTorch data type for model weights. Options: `float32`, `float16`, `bfloat16`. Lower precision (float16/bfloat16) reduces memory usage and increases speed. If None, uses model default.",
          "title": "Torch Dtype"
        },
        "supported_devices": {
          "default": [
            "cpu",
            "cuda",
            "mps",
            "xpu"
          ],
          "description": "List of hardware accelerators supported by this ASR model configuration.",
          "items": {
            "$ref": "#/$defs/AcceleratorDevice"
          },
          "title": "Supported Devices",
          "type": "array"
        }
      },
      "required": [
        "repo_id"
      ],
      "title": "InlineAsrOptions",
      "type": "object"
    }
  },
  "description": "Configuration options for the Automatic Speech Recognition (ASR) pipeline.\n\nThis pipeline processes audio files and converts speech to text using Whisper-based models.\nSupports various audio formats (MP3, WAV, FLAC, etc.) and video files with audio tracks.",
  "properties": {
    "document_timeout": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.",
      "examples": [
        10.0,
        20.0
      ],
      "title": "Document Timeout"
    },
    "accelerator_options": {
      "$ref": "#/$defs/AcceleratorOptions",
      "default": {
        "num_threads": 4,
        "device": "auto",
        "cuda_use_flash_attention2": false
      },
      "description": "Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models."
    },
    "enable_remote_services": {
      "default": false,
      "description": "Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.",
      "examples": [
        false
      ],
      "title": "Enable Remote Services",
      "type": "boolean"
    },
    "allow_external_plugins": {
      "default": false,
      "description": "Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.",
      "examples": [
        false
      ],
      "title": "Allow External Plugins",
      "type": "boolean"
    },
    "artifacts_path": {
      "anyOf": [
        {
          "format": "path",
          "type": "string"
        },
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use `docling-tools models download` to pre-fetch artifacts for offline operation or faster initialization.",
      "examples": [
        "./artifacts",
        "/tmp/docling_outputs"
      ],
      "title": "Artifacts Path"
    },
    "asr_options": {
      "$ref": "#/$defs/InlineAsrOptions",
      "default": {
        "kind": "inline_model_options",
        "repo_id": "tiny",
        "verbose": true,
        "timestamps": true,
        "temperature": 0.0,
        "max_new_tokens": 256,
        "max_time_chunk": 30.0,
        "torch_dtype": null,
        "supported_devices": [
          "cpu",
          "cuda"
        ],
        "inference_framework": "whisper",
        "language": "en",
        "word_timestamps": true
      },
      "description": "Automatic Speech Recognition (ASR) model configuration for audio transcription. Specifies which ASR model to use (e.g., Whisper variants) and model-specific parameters for speech-to-text conversion."
    }
  },
  "title": "AsrPipelineOptions",
  "type": "object"
}

Fields:

accelerator_options pydantic-field

accelerator_options: AcceleratorOptions

Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.

allow_external_plugins pydantic-field

allow_external_plugins: bool

Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.

artifacts_path pydantic-field

artifacts_path: Optional[Union[Path, str]]

Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

asr_options pydantic-field

asr_options: InlineAsrOptions

Automatic Speech Recognition (ASR) model configuration for audio transcription. Specifies which ASR model to use (e.g., Whisper variants) and model-specific parameters for speech-to-text conversion.

document_timeout pydantic-field

document_timeout: Optional[float]

Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.

enable_remote_services pydantic-field

enable_remote_services: bool

Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.

kind class-attribute

kind: str

BaseLayoutOptions pydantic-model

Bases: BaseOptions

Base options for layout models.

Show JSON schema:
{
  "description": "Base options for layout models.",
  "properties": {
    "keep_empty_clusters": {
      "default": false,
      "description": "Retain empty clusters in layout analysis results. When False, clusters without content are removed. Enable for debugging or when empty regions are semantically important.",
      "title": "Keep Empty Clusters",
      "type": "boolean"
    },
    "skip_cell_assignment": {
      "default": false,
      "description": "Skip assignment of cells to table structures during layout analysis. When True, cells are detected but not associated with tables. Use for performance optimization when table structure is not needed.",
      "title": "Skip Cell Assignment",
      "type": "boolean"
    }
  },
  "title": "BaseLayoutOptions",
  "type": "object"
}

Fields:

keep_empty_clusters pydantic-field

keep_empty_clusters: bool

Retain empty clusters in layout analysis results. When False, clusters without content are removed. Enable for debugging or when empty regions are semantically important.

kind class-attribute

kind: str

skip_cell_assignment pydantic-field

skip_cell_assignment: bool

Skip assignment of cells to table structures during layout analysis. When True, cells are detected but not associated with tables. Use for performance optimization when table structure is not needed.

BaseOptions pydantic-model

Bases: BaseModel

Base class for options.

Show JSON schema:
{
  "description": "Base class for options.",
  "properties": {},
  "title": "BaseOptions",
  "type": "object"
}

kind class-attribute

kind: str

BaseTableStructureOptions pydantic-model

Bases: BaseOptions

Base options for table structure models.

Show JSON schema:
{
  "description": "Base options for table structure models.",
  "properties": {},
  "title": "BaseTableStructureOptions",
  "type": "object"
}

kind class-attribute

kind: str

ConvertPipelineOptions pydantic-model

Bases: PipelineOptions

Base configuration for document conversion pipelines.

Show JSON schema:
{
  "$defs": {
    "AcceleratorDevice": {
      "description": "Devices to run model inference",
      "enum": [
        "auto",
        "cpu",
        "cuda",
        "mps",
        "xpu"
      ],
      "title": "AcceleratorDevice",
      "type": "string"
    },
    "AcceleratorOptions": {
      "additionalProperties": false,
      "description": "Hardware acceleration configuration for model inference.\n\nCan be configured via environment variables with DOCLING_ prefix.",
      "properties": {
        "num_threads": {
          "default": 4,
          "description": "Number of CPU threads to use for model inference. Higher values can improve throughput on multi-core systems but may increase memory usage. Can be set via DOCLING_NUM_THREADS or OMP_NUM_THREADS environment variables. Recommended: number of physical CPU cores.",
          "title": "Num Threads",
          "type": "integer"
        },
        "device": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "$ref": "#/$defs/AcceleratorDevice"
            }
          ],
          "default": "auto",
          "description": "Hardware device for model inference. Options: `auto` (automatic detection), `cpu` (CPU only), `cuda` (NVIDIA GPU), `cuda:N` (specific GPU), `mps` (Apple Silicon), `xpu` (Intel GPU). Auto mode selects the best available device. Can be set via DOCLING_DEVICE environment variable.",
          "title": "Device"
        },
        "cuda_use_flash_attention2": {
          "default": false,
          "description": "Enable Flash Attention 2 optimization for CUDA devices. Provides significant speedup and memory reduction for transformer models on compatible NVIDIA GPUs (Ampere or newer). Requires flash-attn package installation. Can be set via DOCLING_CUDA_USE_FLASH_ATTENTION2 environment variable.",
          "title": "Cuda Use Flash Attention2",
          "type": "boolean"
        }
      },
      "title": "AcceleratorOptions",
      "type": "object"
    },
    "PictureClassificationLabel": {
      "description": "PictureClassificationLabel.",
      "enum": [
        "other",
        "picture_group",
        "pie_chart",
        "bar_chart",
        "stacked_bar_chart",
        "line_chart",
        "flow_chart",
        "scatter_chart",
        "heatmap",
        "remote_sensing",
        "natural_image",
        "chemistry_molecular_structure",
        "chemistry_markush_structure",
        "icon",
        "logo",
        "signature",
        "stamp",
        "qr_code",
        "bar_code",
        "screenshot",
        "map",
        "stratigraphic_chart",
        "cad_drawing",
        "electrical_diagram"
      ],
      "title": "PictureClassificationLabel",
      "type": "string"
    },
    "PictureDescriptionBaseOptions": {
      "description": "Base configuration for picture description models.",
      "properties": {
        "batch_size": {
          "default": 8,
          "description": "Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.",
          "title": "Batch Size",
          "type": "integer"
        },
        "scale": {
          "default": 2.0,
          "description": "Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.",
          "title": "Scale",
          "type": "number"
        },
        "picture_area_threshold": {
          "default": 0.05,
          "description": "Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.",
          "title": "Picture Area Threshold",
          "type": "number"
        },
        "classification_allow": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).",
          "title": "Classification Allow"
        },
        "classification_deny": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).",
          "title": "Classification Deny"
        },
        "classification_min_confidence": {
          "default": 0.0,
          "description": "Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).",
          "title": "Classification Min Confidence",
          "type": "number"
        }
      },
      "title": "PictureDescriptionBaseOptions",
      "type": "object"
    }
  },
  "description": "Base configuration for document conversion pipelines.",
  "properties": {
    "document_timeout": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.",
      "examples": [
        10.0,
        20.0
      ],
      "title": "Document Timeout"
    },
    "accelerator_options": {
      "$ref": "#/$defs/AcceleratorOptions",
      "default": {
        "num_threads": 4,
        "device": "auto",
        "cuda_use_flash_attention2": false
      },
      "description": "Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models."
    },
    "enable_remote_services": {
      "default": false,
      "description": "Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.",
      "examples": [
        false
      ],
      "title": "Enable Remote Services",
      "type": "boolean"
    },
    "allow_external_plugins": {
      "default": false,
      "description": "Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.",
      "examples": [
        false
      ],
      "title": "Allow External Plugins",
      "type": "boolean"
    },
    "artifacts_path": {
      "anyOf": [
        {
          "format": "path",
          "type": "string"
        },
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use `docling-tools models download` to pre-fetch artifacts for offline operation or faster initialization.",
      "examples": [
        "./artifacts",
        "/tmp/docling_outputs"
      ],
      "title": "Artifacts Path"
    },
    "do_picture_classification": {
      "default": false,
      "description": "Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.",
      "title": "Do Picture Classification",
      "type": "boolean"
    },
    "do_picture_description": {
      "default": false,
      "description": "Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.",
      "title": "Do Picture Description",
      "type": "boolean"
    },
    "picture_description_options": {
      "$ref": "#/$defs/PictureDescriptionBaseOptions",
      "default": {
        "batch_size": 8,
        "scale": 2.0,
        "picture_area_threshold": 0.05,
        "classification_allow": null,
        "classification_deny": null,
        "classification_min_confidence": 0.0,
        "repo_id": "HuggingFaceTB/SmolVLM-256M-Instruct",
        "prompt": "Describe this image in a few sentences.",
        "generation_config": {
          "do_sample": false,
          "max_new_tokens": 200
        }
      },
      "description": "Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when `do_picture_description=True`."
    }
  },
  "title": "ConvertPipelineOptions",
  "type": "object"
}

Fields:

accelerator_options pydantic-field

accelerator_options: AcceleratorOptions

Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.

allow_external_plugins pydantic-field

allow_external_plugins: bool

Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.

artifacts_path pydantic-field

artifacts_path: Optional[Union[Path, str]]

Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

do_picture_classification pydantic-field

do_picture_classification: bool

Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.

do_picture_description pydantic-field

do_picture_description: bool

Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.

document_timeout pydantic-field

document_timeout: Optional[float]

Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.

enable_remote_services pydantic-field

enable_remote_services: bool

Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.

kind class-attribute

kind: str

picture_description_options pydantic-field

picture_description_options: PictureDescriptionBaseOptions

Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when do_picture_description=True.

EasyOcrOptions pydantic-model

Bases: OcrOptions

Configuration for EasyOCR engine.

Show JSON schema:
{
  "additionalProperties": false,
  "description": "Configuration for EasyOCR engine.",
  "properties": {
    "lang": {
      "default": [
        "fr",
        "de",
        "es",
        "en"
      ],
      "description": "List of language codes for OCR. EasyOCR supports 80+ languages. Use ISO 639-1 codes (e.g., `en`, `fr`, `de`). Multiple languages can be specified for multilingual documents.",
      "items": {
        "type": "string"
      },
      "title": "Lang",
      "type": "array"
    },
    "force_full_page_ocr": {
      "default": false,
      "description": "If enabled, a full-page OCR is always applied.",
      "examples": [
        false
      ],
      "title": "Force Full Page Ocr",
      "type": "boolean"
    },
    "bitmap_area_threshold": {
      "default": 0.05,
      "description": "Percentage of the page area for a bitmap to be processed with OCR.",
      "examples": [
        0.05,
        0.1
      ],
      "title": "Bitmap Area Threshold",
      "type": "number"
    },
    "use_gpu": {
      "anyOf": [
        {
          "type": "boolean"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Enable GPU acceleration for EasyOCR. If None, automatically detects and uses GPU if available. Set to False to force CPU-only processing.",
      "title": "Use Gpu"
    },
    "confidence_threshold": {
      "default": 0.5,
      "description": "Minimum confidence score for text recognition. Text with confidence below this threshold is filtered out. Range: 0.0-1.0. Lower values include more text but may reduce accuracy.",
      "title": "Confidence Threshold",
      "type": "number"
    },
    "model_storage_directory": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Directory path for storing downloaded EasyOCR models. If None, uses default EasyOCR cache location. Useful for offline environments or custom model management.",
      "title": "Model Storage Directory"
    },
    "recog_network": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": "standard",
      "description": "Recognition network architecture to use. Options: `standard` (default, balanced), `craft` (higher accuracy). Different networks may perform better on specific document types.",
      "title": "Recog Network"
    },
    "download_enabled": {
      "default": true,
      "description": "Allow automatic download of EasyOCR models on first use. Disable for offline environments where models must be pre-installed.",
      "title": "Download Enabled",
      "type": "boolean"
    },
    "suppress_mps_warnings": {
      "default": true,
      "description": "Suppress Metal Performance Shaders (MPS) warnings on macOS. Reduces console noise when using Apple Silicon GPUs with EasyOCR.",
      "title": "Suppress Mps Warnings",
      "type": "boolean"
    }
  },
  "title": "EasyOcrOptions",
  "type": "object"
}

Config:

  • extra: forbid
  • protected_namespaces: ()

Fields:

bitmap_area_threshold pydantic-field

bitmap_area_threshold: float

Percentage of the page area for a bitmap to be processed with OCR.

confidence_threshold pydantic-field

confidence_threshold: float

Minimum confidence score for text recognition. Text with confidence below this threshold is filtered out. Range: 0.0-1.0. Lower values include more text but may reduce accuracy.

download_enabled pydantic-field

download_enabled: bool

Allow automatic download of EasyOCR models on first use. Disable for offline environments where models must be pre-installed.

force_full_page_ocr pydantic-field

force_full_page_ocr: bool

If enabled, a full-page OCR is always applied.

kind class-attribute

kind: Literal['easyocr'] = 'easyocr'

lang pydantic-field

lang: list[str]

List of language codes for OCR. EasyOCR supports 80+ languages. Use ISO 639-1 codes (e.g., en, fr, de). Multiple languages can be specified for multilingual documents.

model_config class-attribute instance-attribute

model_config = ConfigDict(extra='forbid', protected_namespaces=())

model_storage_directory pydantic-field

model_storage_directory: Optional[str]

Directory path for storing downloaded EasyOCR models. If None, uses default EasyOCR cache location. Useful for offline environments or custom model management.

recog_network pydantic-field

recog_network: Optional[str]

Recognition network architecture to use. Options: standard (default, balanced), craft (higher accuracy). Different networks may perform better on specific document types.

suppress_mps_warnings pydantic-field

suppress_mps_warnings: bool

Suppress Metal Performance Shaders (MPS) warnings on macOS. Reduces console noise when using Apple Silicon GPUs with EasyOCR.

use_gpu pydantic-field

use_gpu: Optional[bool]

Enable GPU acceleration for EasyOCR. If None, automatically detects and uses GPU if available. Set to False to force CPU-only processing.

LayoutOptions pydantic-model

Bases: BaseLayoutOptions

Options for layout processing.

Show JSON schema:
{
  "$defs": {
    "AcceleratorDevice": {
      "description": "Devices to run model inference",
      "enum": [
        "auto",
        "cpu",
        "cuda",
        "mps",
        "xpu"
      ],
      "title": "AcceleratorDevice",
      "type": "string"
    },
    "LayoutModelConfig": {
      "description": "Configuration for document layout analysis models from HuggingFace.",
      "properties": {
        "name": {
          "description": "Human-readable name identifier for the layout model. Used for logging, debugging, and model selection.",
          "examples": [
            "docling_layout_heron",
            "docling_layout_egret_large"
          ],
          "title": "Name",
          "type": "string"
        },
        "repo_id": {
          "description": "HuggingFace repository ID where the model is hosted. Used to download model weights and configuration files from HuggingFace Hub.",
          "examples": [
            "docling-project/docling-layout-heron",
            "docling-project/docling-layout-egret-large"
          ],
          "title": "Repo Id",
          "type": "string"
        },
        "revision": {
          "description": "Git revision (branch, tag, or commit hash) of the model repository to use. Allows pinning to specific model versions for reproducibility.",
          "examples": [
            "main",
            "v1.0.0"
          ],
          "title": "Revision",
          "type": "string"
        },
        "model_path": {
          "description": "Relative path within the repository to model artifacts. Empty string indicates artifacts are in the repository root. Used for repositories with multiple models or nested structures.",
          "title": "Model Path",
          "type": "string"
        },
        "supported_devices": {
          "default": [
            "cpu",
            "cuda",
            "mps",
            "xpu"
          ],
          "description": "List of hardware accelerators supported by this model. The model can only run on devices in this list.",
          "items": {
            "$ref": "#/$defs/AcceleratorDevice"
          },
          "title": "Supported Devices",
          "type": "array"
        }
      },
      "required": [
        "name",
        "repo_id",
        "revision",
        "model_path"
      ],
      "title": "LayoutModelConfig",
      "type": "object"
    }
  },
  "description": "Options for layout processing.",
  "properties": {
    "keep_empty_clusters": {
      "default": false,
      "description": "Retain empty clusters in layout analysis results. When False, clusters without content are removed. Enable for debugging or when empty regions are semantically important.",
      "title": "Keep Empty Clusters",
      "type": "boolean"
    },
    "skip_cell_assignment": {
      "default": false,
      "description": "Skip assignment of cells to table structures during layout analysis. When True, cells are detected but not associated with tables. Use for performance optimization when table structure is not needed.",
      "title": "Skip Cell Assignment",
      "type": "boolean"
    },
    "create_orphan_clusters": {
      "default": true,
      "description": "Create clusters for orphaned elements not assigned to any structure. When True, isolated text or elements are grouped into their own clusters. Recommended for complete document coverage.",
      "title": "Create Orphan Clusters",
      "type": "boolean"
    },
    "model_spec": {
      "$ref": "#/$defs/LayoutModelConfig",
      "default": {
        "name": "docling_layout_heron",
        "repo_id": "docling-project/docling-layout-heron",
        "revision": "main",
        "model_path": "",
        "supported_devices": [
          "cpu",
          "cuda",
          "mps",
          "xpu"
        ]
      },
      "description": "Layout model configuration specifying which model to use for document layout analysis. Options include DOCLING_LAYOUT_HERON (default, balanced), DOCLING_LAYOUT_EGRET_* (higher accuracy), etc."
    }
  },
  "title": "LayoutOptions",
  "type": "object"
}

Fields:

create_orphan_clusters pydantic-field

create_orphan_clusters: bool

Create clusters for orphaned elements not assigned to any structure. When True, isolated text or elements are grouped into their own clusters. Recommended for complete document coverage.

keep_empty_clusters pydantic-field

keep_empty_clusters: bool

Retain empty clusters in layout analysis results. When False, clusters without content are removed. Enable for debugging or when empty regions are semantically important.

kind class-attribute

kind: str = 'docling_layout_default'

model_spec pydantic-field

model_spec: LayoutModelConfig

Layout model configuration specifying which model to use for document layout analysis. Options include DOCLING_LAYOUT_HERON (default, balanced), DOCLING_LAYOUT_EGRET_* (higher accuracy), etc.

skip_cell_assignment pydantic-field

skip_cell_assignment: bool

Skip assignment of cells to table structures during layout analysis. When True, cells are detected but not associated with tables. Use for performance optimization when table structure is not needed.

OcrAutoOptions pydantic-model

Bases: OcrOptions

Automatic OCR engine selection based on system availability.

Show JSON schema:
{
  "description": "Automatic OCR engine selection based on system availability.",
  "properties": {
    "lang": {
      "default": [],
      "description": "The automatic OCR engine will use the default values of the engine. Please specify the engine explicitly to change the language selection.",
      "items": {
        "type": "string"
      },
      "title": "Lang",
      "type": "array"
    },
    "force_full_page_ocr": {
      "default": false,
      "description": "If enabled, a full-page OCR is always applied.",
      "examples": [
        false
      ],
      "title": "Force Full Page Ocr",
      "type": "boolean"
    },
    "bitmap_area_threshold": {
      "default": 0.05,
      "description": "Percentage of the page area for a bitmap to be processed with OCR.",
      "examples": [
        0.05,
        0.1
      ],
      "title": "Bitmap Area Threshold",
      "type": "number"
    }
  },
  "title": "OcrAutoOptions",
  "type": "object"
}

Fields:

bitmap_area_threshold pydantic-field

bitmap_area_threshold: float

Percentage of the page area for a bitmap to be processed with OCR.

force_full_page_ocr pydantic-field

force_full_page_ocr: bool

If enabled, a full-page OCR is always applied.

kind class-attribute

kind: Literal['auto'] = 'auto'

lang pydantic-field

lang: list[str]

The automatic OCR engine will use the default values of the engine. Please specify the engine explicitly to change the language selection.

OcrEngine

Bases: str, Enum

Available OCR (Optical Character Recognition) engines for text extraction from images.

Each engine has different characteristics in terms of accuracy, speed, language support, and platform compatibility. Choose based on your specific requirements.

Attributes:

  • AUTO

    Automatically select the best available OCR engine based on platform and installed libraries.

  • EASYOCR

    Deep learning-based OCR supporting 80+ languages with GPU acceleration.

  • TESSERACT_CLI

    Tesseract OCR via command-line interface (requires system installation).

  • TESSERACT

    Tesseract OCR via Python bindings (tesserocr library).

  • OCRMAC

    Native macOS Vision framework OCR (Apple platforms only).

  • RAPIDOCR

    Lightweight OCR with multiple backend options (ONNX, OpenVINO, PaddlePaddle).

AUTO class-attribute instance-attribute

AUTO = 'auto'

EASYOCR class-attribute instance-attribute

EASYOCR = 'easyocr'

OCRMAC class-attribute instance-attribute

OCRMAC = 'ocrmac'

RAPIDOCR class-attribute instance-attribute

RAPIDOCR = 'rapidocr'

TESSERACT class-attribute instance-attribute

TESSERACT = 'tesseract'

TESSERACT_CLI class-attribute instance-attribute

TESSERACT_CLI = 'tesseract_cli'

OcrMacOptions pydantic-model

Bases: OcrOptions

Configuration for native macOS OCR using Vision framework.

Show JSON schema:
{
  "additionalProperties": false,
  "description": "Configuration for native macOS OCR using Vision framework.",
  "properties": {
    "lang": {
      "default": [
        "fr-FR",
        "de-DE",
        "es-ES",
        "en-US"
      ],
      "description": "List of language locale codes for macOS OCR. Use format `language-REGION` (e.g., `en-US`, `fr-FR`). Leverages native macOS Vision framework for OCR on Apple platforms.",
      "items": {
        "type": "string"
      },
      "title": "Lang",
      "type": "array"
    },
    "force_full_page_ocr": {
      "default": false,
      "description": "If enabled, a full-page OCR is always applied.",
      "examples": [
        false
      ],
      "title": "Force Full Page Ocr",
      "type": "boolean"
    },
    "bitmap_area_threshold": {
      "default": 0.05,
      "description": "Percentage of the page area for a bitmap to be processed with OCR.",
      "examples": [
        0.05,
        0.1
      ],
      "title": "Bitmap Area Threshold",
      "type": "number"
    },
    "recognition": {
      "default": "accurate",
      "description": "Recognition accuracy level. Options: `accurate` (higher quality, slower) or `fast` (lower quality, faster). Choose based on speed vs. accuracy requirements.",
      "title": "Recognition",
      "type": "string"
    },
    "framework": {
      "default": "vision",
      "description": "macOS framework to use for OCR. Currently supports `vision` (Apple Vision framework). Future versions may support additional frameworks.",
      "title": "Framework",
      "type": "string"
    }
  },
  "title": "OcrMacOptions",
  "type": "object"
}

Config:

  • extra: forbid

Fields:

bitmap_area_threshold pydantic-field

bitmap_area_threshold: float

Percentage of the page area for a bitmap to be processed with OCR.

force_full_page_ocr pydantic-field

force_full_page_ocr: bool

If enabled, a full-page OCR is always applied.

framework pydantic-field

framework: str

macOS framework to use for OCR. Currently supports vision (Apple Vision framework). Future versions may support additional frameworks.

kind class-attribute

kind: Literal['ocrmac'] = 'ocrmac'

lang pydantic-field

lang: list[str]

List of language locale codes for macOS OCR. Use format language-REGION (e.g., en-US, fr-FR). Leverages native macOS Vision framework for OCR on Apple platforms.

model_config class-attribute instance-attribute

model_config = ConfigDict(extra='forbid')

recognition pydantic-field

recognition: str

Recognition accuracy level. Options: accurate (higher quality, slower) or fast (lower quality, faster). Choose based on speed vs. accuracy requirements.

OcrOptions pydantic-model

Bases: BaseOptions

OCR options.

Show JSON schema:
{
  "description": "OCR options.",
  "properties": {
    "lang": {
      "description": "List of OCR languages to use. The format must match the values of the OCR engine of choice.",
      "examples": [
        [
          "deu",
          "eng"
        ]
      ],
      "items": {
        "type": "string"
      },
      "title": "Lang",
      "type": "array"
    },
    "force_full_page_ocr": {
      "default": false,
      "description": "If enabled, a full-page OCR is always applied.",
      "examples": [
        false
      ],
      "title": "Force Full Page Ocr",
      "type": "boolean"
    },
    "bitmap_area_threshold": {
      "default": 0.05,
      "description": "Percentage of the page area for a bitmap to be processed with OCR.",
      "examples": [
        0.05,
        0.1
      ],
      "title": "Bitmap Area Threshold",
      "type": "number"
    }
  },
  "required": [
    "lang"
  ],
  "title": "OcrOptions",
  "type": "object"
}

Fields:

bitmap_area_threshold pydantic-field

bitmap_area_threshold: float

Percentage of the page area for a bitmap to be processed with OCR.

force_full_page_ocr pydantic-field

force_full_page_ocr: bool

If enabled, a full-page OCR is always applied.

kind class-attribute

kind: str

lang pydantic-field

lang: list[str]

List of OCR languages to use. The format must match the values of the OCR engine of choice.

PaginatedPipelineOptions pydantic-model

Bases: ConvertPipelineOptions

Configuration for pipelines processing paginated documents.

Show JSON schema:
{
  "$defs": {
    "AcceleratorDevice": {
      "description": "Devices to run model inference",
      "enum": [
        "auto",
        "cpu",
        "cuda",
        "mps",
        "xpu"
      ],
      "title": "AcceleratorDevice",
      "type": "string"
    },
    "AcceleratorOptions": {
      "additionalProperties": false,
      "description": "Hardware acceleration configuration for model inference.\n\nCan be configured via environment variables with DOCLING_ prefix.",
      "properties": {
        "num_threads": {
          "default": 4,
          "description": "Number of CPU threads to use for model inference. Higher values can improve throughput on multi-core systems but may increase memory usage. Can be set via DOCLING_NUM_THREADS or OMP_NUM_THREADS environment variables. Recommended: number of physical CPU cores.",
          "title": "Num Threads",
          "type": "integer"
        },
        "device": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "$ref": "#/$defs/AcceleratorDevice"
            }
          ],
          "default": "auto",
          "description": "Hardware device for model inference. Options: `auto` (automatic detection), `cpu` (CPU only), `cuda` (NVIDIA GPU), `cuda:N` (specific GPU), `mps` (Apple Silicon), `xpu` (Intel GPU). Auto mode selects the best available device. Can be set via DOCLING_DEVICE environment variable.",
          "title": "Device"
        },
        "cuda_use_flash_attention2": {
          "default": false,
          "description": "Enable Flash Attention 2 optimization for CUDA devices. Provides significant speedup and memory reduction for transformer models on compatible NVIDIA GPUs (Ampere or newer). Requires flash-attn package installation. Can be set via DOCLING_CUDA_USE_FLASH_ATTENTION2 environment variable.",
          "title": "Cuda Use Flash Attention2",
          "type": "boolean"
        }
      },
      "title": "AcceleratorOptions",
      "type": "object"
    },
    "PictureClassificationLabel": {
      "description": "PictureClassificationLabel.",
      "enum": [
        "other",
        "picture_group",
        "pie_chart",
        "bar_chart",
        "stacked_bar_chart",
        "line_chart",
        "flow_chart",
        "scatter_chart",
        "heatmap",
        "remote_sensing",
        "natural_image",
        "chemistry_molecular_structure",
        "chemistry_markush_structure",
        "icon",
        "logo",
        "signature",
        "stamp",
        "qr_code",
        "bar_code",
        "screenshot",
        "map",
        "stratigraphic_chart",
        "cad_drawing",
        "electrical_diagram"
      ],
      "title": "PictureClassificationLabel",
      "type": "string"
    },
    "PictureDescriptionBaseOptions": {
      "description": "Base configuration for picture description models.",
      "properties": {
        "batch_size": {
          "default": 8,
          "description": "Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.",
          "title": "Batch Size",
          "type": "integer"
        },
        "scale": {
          "default": 2.0,
          "description": "Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.",
          "title": "Scale",
          "type": "number"
        },
        "picture_area_threshold": {
          "default": 0.05,
          "description": "Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.",
          "title": "Picture Area Threshold",
          "type": "number"
        },
        "classification_allow": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).",
          "title": "Classification Allow"
        },
        "classification_deny": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).",
          "title": "Classification Deny"
        },
        "classification_min_confidence": {
          "default": 0.0,
          "description": "Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).",
          "title": "Classification Min Confidence",
          "type": "number"
        }
      },
      "title": "PictureDescriptionBaseOptions",
      "type": "object"
    }
  },
  "description": "Configuration for pipelines processing paginated documents.",
  "properties": {
    "document_timeout": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.",
      "examples": [
        10.0,
        20.0
      ],
      "title": "Document Timeout"
    },
    "accelerator_options": {
      "$ref": "#/$defs/AcceleratorOptions",
      "default": {
        "num_threads": 4,
        "device": "auto",
        "cuda_use_flash_attention2": false
      },
      "description": "Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models."
    },
    "enable_remote_services": {
      "default": false,
      "description": "Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.",
      "examples": [
        false
      ],
      "title": "Enable Remote Services",
      "type": "boolean"
    },
    "allow_external_plugins": {
      "default": false,
      "description": "Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.",
      "examples": [
        false
      ],
      "title": "Allow External Plugins",
      "type": "boolean"
    },
    "artifacts_path": {
      "anyOf": [
        {
          "format": "path",
          "type": "string"
        },
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use `docling-tools models download` to pre-fetch artifacts for offline operation or faster initialization.",
      "examples": [
        "./artifacts",
        "/tmp/docling_outputs"
      ],
      "title": "Artifacts Path"
    },
    "do_picture_classification": {
      "default": false,
      "description": "Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.",
      "title": "Do Picture Classification",
      "type": "boolean"
    },
    "do_picture_description": {
      "default": false,
      "description": "Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.",
      "title": "Do Picture Description",
      "type": "boolean"
    },
    "picture_description_options": {
      "$ref": "#/$defs/PictureDescriptionBaseOptions",
      "default": {
        "batch_size": 8,
        "scale": 2.0,
        "picture_area_threshold": 0.05,
        "classification_allow": null,
        "classification_deny": null,
        "classification_min_confidence": 0.0,
        "repo_id": "HuggingFaceTB/SmolVLM-256M-Instruct",
        "prompt": "Describe this image in a few sentences.",
        "generation_config": {
          "do_sample": false,
          "max_new_tokens": 200
        }
      },
      "description": "Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when `do_picture_description=True`."
    },
    "images_scale": {
      "default": 1.0,
      "description": "Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements. Recommended values: 1.0 (standard quality), 2.0 (high resolution), 0.5 (lower resolution for previews).",
      "title": "Images Scale",
      "type": "number"
    },
    "generate_page_images": {
      "default": false,
      "description": "Generate rendered page images during extraction. Creates PNG representations of each page for visual preview, validation, or downstream image-based machine learning tasks.",
      "title": "Generate Page Images",
      "type": "boolean"
    },
    "generate_picture_images": {
      "default": false,
      "description": "Extract and save embedded images from the document. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.",
      "title": "Generate Picture Images",
      "type": "boolean"
    }
  },
  "title": "PaginatedPipelineOptions",
  "type": "object"
}

Fields:

accelerator_options pydantic-field

accelerator_options: AcceleratorOptions

Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.

allow_external_plugins pydantic-field

allow_external_plugins: bool

Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.

artifacts_path pydantic-field

artifacts_path: Optional[Union[Path, str]]

Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

do_picture_classification pydantic-field

do_picture_classification: bool

Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.

do_picture_description pydantic-field

do_picture_description: bool

Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.

document_timeout pydantic-field

document_timeout: Optional[float]

Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.

enable_remote_services pydantic-field

enable_remote_services: bool

Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.

generate_page_images pydantic-field

generate_page_images: bool

Generate rendered page images during extraction. Creates PNG representations of each page for visual preview, validation, or downstream image-based machine learning tasks.

generate_picture_images pydantic-field

generate_picture_images: bool

Extract and save embedded images from the document. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.

images_scale pydantic-field

images_scale: float

Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements. Recommended values: 1.0 (standard quality), 2.0 (high resolution), 0.5 (lower resolution for previews).

kind class-attribute

kind: str

picture_description_options pydantic-field

picture_description_options: PictureDescriptionBaseOptions

Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when do_picture_description=True.

PdfBackend

Bases: str, Enum

Available PDF parsing backends for document processing.

Different backends offer varying levels of text extraction quality, layout preservation, and processing speed. Choose based on your document complexity and quality requirements.

Attributes:

  • PYPDFIUM2

    Standard PDF parser using PyPDFium2 library. Fast and reliable for basic text extraction.

  • DLPARSE_V1

    Docling Parse v1 backend with enhanced layout analysis and structure preservation.

  • DLPARSE_V2

    Docling Parse v2 backend with improved table detection and complex layout handling.

  • DLPARSE_V4

    Docling Parse v4 backend (latest) with advanced features and best accuracy for complex documents.

DLPARSE_V1 class-attribute instance-attribute

DLPARSE_V1 = 'dlparse_v1'

DLPARSE_V2 class-attribute instance-attribute

DLPARSE_V2 = 'dlparse_v2'

DLPARSE_V4 class-attribute instance-attribute

DLPARSE_V4 = 'dlparse_v4'

PYPDFIUM2 class-attribute instance-attribute

PYPDFIUM2 = 'pypdfium2'

PdfPipelineOptions pydantic-model

Bases: PaginatedPipelineOptions

Configuration options for the PDF document processing pipeline.

Notes
  • Enabling multiple features (OCR, table structure, formulas) increases the processing time significantly. Enable only necessary features for your use case.
  • For production systems processing large document volumes, implement a timeout protection (for instance, 90-120 seconds via document_timeout parameter).
  • OCR requires a system installation of engines (Tesseract, EasyOCR). Verify the installation before enabling OCR via do_ocr=True.
  • RapidOCR has known issues with read-only filesystems (e.g., Databricks). Consider Tesseract or alternative backends for distributed systems.
See Also
  • examples/pipeline_options_advanced.py: Comprehensive configuration examples.
Show JSON schema:
{
  "$defs": {
    "AcceleratorDevice": {
      "description": "Devices to run model inference",
      "enum": [
        "auto",
        "cpu",
        "cuda",
        "mps",
        "xpu"
      ],
      "title": "AcceleratorDevice",
      "type": "string"
    },
    "AcceleratorOptions": {
      "additionalProperties": false,
      "description": "Hardware acceleration configuration for model inference.\n\nCan be configured via environment variables with DOCLING_ prefix.",
      "properties": {
        "num_threads": {
          "default": 4,
          "description": "Number of CPU threads to use for model inference. Higher values can improve throughput on multi-core systems but may increase memory usage. Can be set via DOCLING_NUM_THREADS or OMP_NUM_THREADS environment variables. Recommended: number of physical CPU cores.",
          "title": "Num Threads",
          "type": "integer"
        },
        "device": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "$ref": "#/$defs/AcceleratorDevice"
            }
          ],
          "default": "auto",
          "description": "Hardware device for model inference. Options: `auto` (automatic detection), `cpu` (CPU only), `cuda` (NVIDIA GPU), `cuda:N` (specific GPU), `mps` (Apple Silicon), `xpu` (Intel GPU). Auto mode selects the best available device. Can be set via DOCLING_DEVICE environment variable.",
          "title": "Device"
        },
        "cuda_use_flash_attention2": {
          "default": false,
          "description": "Enable Flash Attention 2 optimization for CUDA devices. Provides significant speedup and memory reduction for transformer models on compatible NVIDIA GPUs (Ampere or newer). Requires flash-attn package installation. Can be set via DOCLING_CUDA_USE_FLASH_ATTENTION2 environment variable.",
          "title": "Cuda Use Flash Attention2",
          "type": "boolean"
        }
      },
      "title": "AcceleratorOptions",
      "type": "object"
    },
    "BaseLayoutOptions": {
      "description": "Base options for layout models.",
      "properties": {
        "keep_empty_clusters": {
          "default": false,
          "description": "Retain empty clusters in layout analysis results. When False, clusters without content are removed. Enable for debugging or when empty regions are semantically important.",
          "title": "Keep Empty Clusters",
          "type": "boolean"
        },
        "skip_cell_assignment": {
          "default": false,
          "description": "Skip assignment of cells to table structures during layout analysis. When True, cells are detected but not associated with tables. Use for performance optimization when table structure is not needed.",
          "title": "Skip Cell Assignment",
          "type": "boolean"
        }
      },
      "title": "BaseLayoutOptions",
      "type": "object"
    },
    "BaseTableStructureOptions": {
      "description": "Base options for table structure models.",
      "properties": {},
      "title": "BaseTableStructureOptions",
      "type": "object"
    },
    "OcrOptions": {
      "description": "OCR options.",
      "properties": {
        "lang": {
          "description": "List of OCR languages to use. The format must match the values of the OCR engine of choice.",
          "examples": [
            [
              "deu",
              "eng"
            ]
          ],
          "items": {
            "type": "string"
          },
          "title": "Lang",
          "type": "array"
        },
        "force_full_page_ocr": {
          "default": false,
          "description": "If enabled, a full-page OCR is always applied.",
          "examples": [
            false
          ],
          "title": "Force Full Page Ocr",
          "type": "boolean"
        },
        "bitmap_area_threshold": {
          "default": 0.05,
          "description": "Percentage of the page area for a bitmap to be processed with OCR.",
          "examples": [
            0.05,
            0.1
          ],
          "title": "Bitmap Area Threshold",
          "type": "number"
        }
      },
      "required": [
        "lang"
      ],
      "title": "OcrOptions",
      "type": "object"
    },
    "PictureClassificationLabel": {
      "description": "PictureClassificationLabel.",
      "enum": [
        "other",
        "picture_group",
        "pie_chart",
        "bar_chart",
        "stacked_bar_chart",
        "line_chart",
        "flow_chart",
        "scatter_chart",
        "heatmap",
        "remote_sensing",
        "natural_image",
        "chemistry_molecular_structure",
        "chemistry_markush_structure",
        "icon",
        "logo",
        "signature",
        "stamp",
        "qr_code",
        "bar_code",
        "screenshot",
        "map",
        "stratigraphic_chart",
        "cad_drawing",
        "electrical_diagram"
      ],
      "title": "PictureClassificationLabel",
      "type": "string"
    },
    "PictureDescriptionBaseOptions": {
      "description": "Base configuration for picture description models.",
      "properties": {
        "batch_size": {
          "default": 8,
          "description": "Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.",
          "title": "Batch Size",
          "type": "integer"
        },
        "scale": {
          "default": 2.0,
          "description": "Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.",
          "title": "Scale",
          "type": "number"
        },
        "picture_area_threshold": {
          "default": 0.05,
          "description": "Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.",
          "title": "Picture Area Threshold",
          "type": "number"
        },
        "classification_allow": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).",
          "title": "Classification Allow"
        },
        "classification_deny": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).",
          "title": "Classification Deny"
        },
        "classification_min_confidence": {
          "default": 0.0,
          "description": "Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).",
          "title": "Classification Min Confidence",
          "type": "number"
        }
      },
      "title": "PictureDescriptionBaseOptions",
      "type": "object"
    }
  },
  "description": "Configuration options for the PDF document processing pipeline.\n\nNotes:\n    - Enabling multiple features (OCR, table structure, formulas) increases the processing time significantly.\n        Enable only necessary features for your use case.\n    - For production systems processing large document volumes, implement a timeout protection (for instance, 90-120\n        seconds via `document_timeout` parameter).\n    - OCR requires a system installation of engines (Tesseract, EasyOCR). Verify the installation before enabling\n        OCR via `do_ocr=True`.\n    - RapidOCR has known issues with read-only filesystems (e.g., Databricks). Consider Tesseract or alternative\n        backends for distributed systems.\n\nSee Also:\n    - `examples/pipeline_options_advanced.py`: Comprehensive configuration examples.",
  "properties": {
    "document_timeout": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.",
      "examples": [
        10.0,
        20.0
      ],
      "title": "Document Timeout"
    },
    "accelerator_options": {
      "$ref": "#/$defs/AcceleratorOptions",
      "default": {
        "num_threads": 4,
        "device": "auto",
        "cuda_use_flash_attention2": false
      },
      "description": "Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models."
    },
    "enable_remote_services": {
      "default": false,
      "description": "Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.",
      "examples": [
        false
      ],
      "title": "Enable Remote Services",
      "type": "boolean"
    },
    "allow_external_plugins": {
      "default": false,
      "description": "Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.",
      "examples": [
        false
      ],
      "title": "Allow External Plugins",
      "type": "boolean"
    },
    "artifacts_path": {
      "anyOf": [
        {
          "format": "path",
          "type": "string"
        },
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use `docling-tools models download` to pre-fetch artifacts for offline operation or faster initialization.",
      "examples": [
        "./artifacts",
        "/tmp/docling_outputs"
      ],
      "title": "Artifacts Path"
    },
    "do_picture_classification": {
      "default": false,
      "description": "Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.",
      "title": "Do Picture Classification",
      "type": "boolean"
    },
    "do_picture_description": {
      "default": false,
      "description": "Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.",
      "title": "Do Picture Description",
      "type": "boolean"
    },
    "picture_description_options": {
      "$ref": "#/$defs/PictureDescriptionBaseOptions",
      "default": {
        "batch_size": 8,
        "scale": 2.0,
        "picture_area_threshold": 0.05,
        "classification_allow": null,
        "classification_deny": null,
        "classification_min_confidence": 0.0,
        "repo_id": "HuggingFaceTB/SmolVLM-256M-Instruct",
        "prompt": "Describe this image in a few sentences.",
        "generation_config": {
          "do_sample": false,
          "max_new_tokens": 200
        }
      },
      "description": "Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when `do_picture_description=True`."
    },
    "images_scale": {
      "default": 1.0,
      "description": "Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements. Recommended values: 1.0 (standard quality), 2.0 (high resolution), 0.5 (lower resolution for previews).",
      "title": "Images Scale",
      "type": "number"
    },
    "generate_page_images": {
      "default": false,
      "description": "Generate rendered page images during extraction. Creates PNG representations of each page for visual preview, validation, or downstream image-based machine learning tasks.",
      "title": "Generate Page Images",
      "type": "boolean"
    },
    "generate_picture_images": {
      "default": false,
      "description": "Extract and save embedded images from the PDF. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.",
      "title": "Generate Picture Images",
      "type": "boolean"
    },
    "do_table_structure": {
      "default": true,
      "description": "Enable table structure extraction and reconstruction. Detects table regions, extracts cell content with row/column relationships, and reconstructs the logical table structure for downstream processing.",
      "title": "Do Table Structure",
      "type": "boolean"
    },
    "do_ocr": {
      "default": true,
      "description": "Enable Optical Character Recognition for scanned or image-based PDFs. Replaces or supplements programmatic text extraction with OCR-detected text. Required for scanned documents with no embedded text layer. Note: OCR significantly increases processing time.",
      "title": "Do Ocr",
      "type": "boolean"
    },
    "do_code_enrichment": {
      "default": false,
      "description": "Enable specialized processing for code blocks. Applies code-aware OCR and formatting to improve accuracy of programming language snippets, terminal output, and structured code content.",
      "title": "Do Code Enrichment",
      "type": "boolean"
    },
    "do_formula_enrichment": {
      "default": false,
      "description": "Enable mathematical formula recognition and LaTeX conversion. Uses specialized models to detect and extract mathematical expressions, converting them to LaTeX format for accurate representation.",
      "title": "Do Formula Enrichment",
      "type": "boolean"
    },
    "force_backend_text": {
      "default": false,
      "description": "Force use of PDF backend's native text extraction instead of layout model predictions. When enabled, bypasses the layout model's text detection and uses the embedded text from the PDF file directly. Useful for PDFs with reliable programmatic text layers.",
      "title": "Force Backend Text",
      "type": "boolean"
    },
    "table_structure_options": {
      "$ref": "#/$defs/BaseTableStructureOptions",
      "default": {
        "do_cell_matching": true,
        "mode": "accurate"
      },
      "description": "Configuration for table structure extraction. Controls table detection accuracy, cell matching behavior, and table formatting. Only applicable when `do_table_structure=True`."
    },
    "ocr_options": {
      "$ref": "#/$defs/OcrOptions",
      "default": {
        "lang": [],
        "force_full_page_ocr": false,
        "bitmap_area_threshold": 0.05
      },
      "description": "Configuration for OCR engine. Specifies which OCR engine to use (Tesseract, EasyOCR, RapidOCR, etc.) and engine-specific settings. Only applicable when `do_ocr=True`."
    },
    "layout_options": {
      "$ref": "#/$defs/BaseLayoutOptions",
      "default": {
        "keep_empty_clusters": false,
        "skip_cell_assignment": false,
        "create_orphan_clusters": true,
        "model_spec": {
          "model_path": "",
          "name": "docling_layout_heron",
          "repo_id": "docling-project/docling-layout-heron",
          "revision": "main",
          "supported_devices": [
            "cpu",
            "cuda",
            "mps",
            "xpu"
          ]
        }
      },
      "description": "Configuration for document layout analysis model. Controls layout detection behavior including cluster creation for orphaned elements, cell assignment to table structures, and handling of empty regions. Specifies which layout model to use (default: Heron)."
    },
    "generate_table_images": {
      "default": false,
      "deprecated": true,
      "title": "Generate Table Images",
      "type": "boolean"
    },
    "generate_parsed_pages": {
      "default": false,
      "description": "Retain intermediate parsed page representations after processing. When enabled, keeps detailed page-level parsing data structures for debugging or advanced post-processing. Increases memory usage. Automatically disabled after document assembly unless explicitly enabled.",
      "title": "Generate Parsed Pages",
      "type": "boolean"
    },
    "ocr_batch_size": {
      "default": 4,
      "description": "Batch size for OCR processing stage in threaded pipeline. Pages are grouped and processed together to improve throughput. Higher values increase GPU/CPU utilization but require more memory. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Ocr Batch Size",
      "type": "integer"
    },
    "layout_batch_size": {
      "default": 4,
      "description": "Batch size for layout analysis stage in threaded pipeline. Pages are grouped and processed together by the layout model. Higher values improve throughput but increase memory usage. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Layout Batch Size",
      "type": "integer"
    },
    "table_batch_size": {
      "default": 4,
      "description": "Batch size for table structure extraction stage in threaded pipeline. Tables from multiple pages are processed together. Higher values improve throughput but increase memory usage. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Table Batch Size",
      "type": "integer"
    },
    "batch_polling_interval_seconds": {
      "default": 0.5,
      "description": "Polling interval in seconds for batch collection in threaded pipeline stages. Each stage waits up to this duration to accumulate items before processing. Lower values reduce latency but may decrease batching efficiency. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Batch Polling Interval Seconds",
      "type": "number"
    },
    "queue_max_size": {
      "default": 100,
      "description": "Maximum queue size for inter-stage communication in threaded pipeline. Limits the number of items buffered between processing stages to prevent memory overflow. When full, upstream stages block until space is available. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Queue Max Size",
      "type": "integer"
    }
  },
  "title": "PdfPipelineOptions",
  "type": "object"
}

Fields:

accelerator_options pydantic-field

accelerator_options: AcceleratorOptions

Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.

allow_external_plugins pydantic-field

allow_external_plugins: bool

Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.

artifacts_path pydantic-field

artifacts_path: Optional[Union[Path, str]]

Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

batch_polling_interval_seconds pydantic-field

batch_polling_interval_seconds: float

Polling interval in seconds for batch collection in threaded pipeline stages. Each stage waits up to this duration to accumulate items before processing. Lower values reduce latency but may decrease batching efficiency. Only used by StandardPdfPipeline (threaded mode).

do_code_enrichment pydantic-field

do_code_enrichment: bool

Enable specialized processing for code blocks. Applies code-aware OCR and formatting to improve accuracy of programming language snippets, terminal output, and structured code content.

do_formula_enrichment pydantic-field

do_formula_enrichment: bool

Enable mathematical formula recognition and LaTeX conversion. Uses specialized models to detect and extract mathematical expressions, converting them to LaTeX format for accurate representation.

do_ocr pydantic-field

do_ocr: bool

Enable Optical Character Recognition for scanned or image-based PDFs. Replaces or supplements programmatic text extraction with OCR-detected text. Required for scanned documents with no embedded text layer. Note: OCR significantly increases processing time.

do_picture_classification pydantic-field

do_picture_classification: bool

Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.

do_picture_description pydantic-field

do_picture_description: bool

Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.

do_table_structure pydantic-field

do_table_structure: bool

Enable table structure extraction and reconstruction. Detects table regions, extracts cell content with row/column relationships, and reconstructs the logical table structure for downstream processing.

document_timeout pydantic-field

document_timeout: Optional[float]

Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.

enable_remote_services pydantic-field

enable_remote_services: bool

Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.

force_backend_text pydantic-field

force_backend_text: bool

Force use of PDF backend's native text extraction instead of layout model predictions. When enabled, bypasses the layout model's text detection and uses the embedded text from the PDF file directly. Useful for PDFs with reliable programmatic text layers.

generate_page_images pydantic-field

generate_page_images: bool

Generate rendered page images during extraction. Creates PNG representations of each page for visual preview, validation, or downstream image-based machine learning tasks.

generate_parsed_pages pydantic-field

generate_parsed_pages: bool

Retain intermediate parsed page representations after processing. When enabled, keeps detailed page-level parsing data structures for debugging or advanced post-processing. Increases memory usage. Automatically disabled after document assembly unless explicitly enabled.

generate_picture_images pydantic-field

generate_picture_images: bool

Extract and save embedded images from the PDF. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.

generate_table_images pydantic-field

generate_table_images: bool

images_scale pydantic-field

images_scale: float

Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements. Recommended values: 1.0 (standard quality), 2.0 (high resolution), 0.5 (lower resolution for previews).

kind class-attribute

kind: str

layout_batch_size pydantic-field

layout_batch_size: int

Batch size for layout analysis stage in threaded pipeline. Pages are grouped and processed together by the layout model. Higher values improve throughput but increase memory usage. Only used by StandardPdfPipeline (threaded mode).

layout_options pydantic-field

layout_options: BaseLayoutOptions

Configuration for document layout analysis model. Controls layout detection behavior including cluster creation for orphaned elements, cell assignment to table structures, and handling of empty regions. Specifies which layout model to use (default: Heron).

ocr_batch_size pydantic-field

ocr_batch_size: int

Batch size for OCR processing stage in threaded pipeline. Pages are grouped and processed together to improve throughput. Higher values increase GPU/CPU utilization but require more memory. Only used by StandardPdfPipeline (threaded mode).

ocr_options pydantic-field

ocr_options: OcrOptions

Configuration for OCR engine. Specifies which OCR engine to use (Tesseract, EasyOCR, RapidOCR, etc.) and engine-specific settings. Only applicable when do_ocr=True.

picture_description_options pydantic-field

picture_description_options: PictureDescriptionBaseOptions

Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when do_picture_description=True.

queue_max_size pydantic-field

queue_max_size: int

Maximum queue size for inter-stage communication in threaded pipeline. Limits the number of items buffered between processing stages to prevent memory overflow. When full, upstream stages block until space is available. Only used by StandardPdfPipeline (threaded mode).

table_batch_size pydantic-field

table_batch_size: int

Batch size for table structure extraction stage in threaded pipeline. Tables from multiple pages are processed together. Higher values improve throughput but increase memory usage. Only used by StandardPdfPipeline (threaded mode).

table_structure_options pydantic-field

table_structure_options: BaseTableStructureOptions

Configuration for table structure extraction. Controls table detection accuracy, cell matching behavior, and table formatting. Only applicable when do_table_structure=True.

PictureDescriptionApiOptions pydantic-model

Bases: PictureDescriptionBaseOptions

Configuration for API-based picture description services.

Show JSON schema:
{
  "$defs": {
    "PictureClassificationLabel": {
      "description": "PictureClassificationLabel.",
      "enum": [
        "other",
        "picture_group",
        "pie_chart",
        "bar_chart",
        "stacked_bar_chart",
        "line_chart",
        "flow_chart",
        "scatter_chart",
        "heatmap",
        "remote_sensing",
        "natural_image",
        "chemistry_molecular_structure",
        "chemistry_markush_structure",
        "icon",
        "logo",
        "signature",
        "stamp",
        "qr_code",
        "bar_code",
        "screenshot",
        "map",
        "stratigraphic_chart",
        "cad_drawing",
        "electrical_diagram"
      ],
      "title": "PictureClassificationLabel",
      "type": "string"
    }
  },
  "description": "Configuration for API-based picture description services.",
  "properties": {
    "batch_size": {
      "default": 8,
      "description": "Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.",
      "title": "Batch Size",
      "type": "integer"
    },
    "scale": {
      "default": 2.0,
      "description": "Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.",
      "title": "Scale",
      "type": "number"
    },
    "picture_area_threshold": {
      "default": 0.05,
      "description": "Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.",
      "title": "Picture Area Threshold",
      "type": "number"
    },
    "classification_allow": {
      "anyOf": [
        {
          "items": {
            "$ref": "#/$defs/PictureClassificationLabel"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).",
      "title": "Classification Allow"
    },
    "classification_deny": {
      "anyOf": [
        {
          "items": {
            "$ref": "#/$defs/PictureClassificationLabel"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).",
      "title": "Classification Deny"
    },
    "classification_min_confidence": {
      "default": 0.0,
      "description": "Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).",
      "title": "Classification Min Confidence",
      "type": "number"
    },
    "url": {
      "default": "http://localhost:8000/v1/chat/completions",
      "description": "API endpoint URL for picture description service. Must be OpenAI-compatible chat completions endpoint. Default points to local server; update for cloud services or custom deployments.",
      "format": "uri",
      "minLength": 1,
      "title": "Url",
      "type": "string"
    },
    "headers": {
      "additionalProperties": {
        "type": "string"
      },
      "default": {},
      "description": "HTTP headers to include in API requests. Use for authentication or custom headers required by your API service.",
      "examples": [
        {
          "Authorization": "Bearer TOKEN"
        }
      ],
      "title": "Headers",
      "type": "object"
    },
    "params": {
      "additionalProperties": true,
      "default": {},
      "description": "Additional query parameters to include in API requests. Service-specific parameters for customizing API behavior beyond standard options.",
      "title": "Params",
      "type": "object"
    },
    "timeout": {
      "default": 20.0,
      "description": "Maximum time in seconds to wait for API response before timing out. Increase for slow networks or complex image descriptions. Recommended: 10-60 seconds.",
      "title": "Timeout",
      "type": "number"
    },
    "concurrency": {
      "default": 1,
      "description": "Number of concurrent API requests allowed. Higher values improve throughput but may hit API rate limits. Adjust based on API service quotas and network capacity.",
      "title": "Concurrency",
      "type": "integer"
    },
    "prompt": {
      "default": "Describe this image in a few sentences.",
      "description": "Prompt template sent to the vision model for image description. Customize to guide the model's output style, detail level, or focus.",
      "examples": [
        "Provide a technical description of this diagram"
      ],
      "title": "Prompt",
      "type": "string"
    },
    "provenance": {
      "default": "",
      "description": "Provenance information to track the source or method of picture descriptions. Used for metadata and auditing purposes in the output document.",
      "title": "Provenance",
      "type": "string"
    }
  },
  "title": "PictureDescriptionApiOptions",
  "type": "object"
}

Fields:

batch_size pydantic-field

batch_size: int

Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.

classification_allow pydantic-field

classification_allow: Optional[list[PictureClassificationLabel]]

List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).

classification_deny pydantic-field

classification_deny: Optional[list[PictureClassificationLabel]]

List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).

classification_min_confidence pydantic-field

classification_min_confidence: float

Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).

concurrency pydantic-field

concurrency: int

Number of concurrent API requests allowed. Higher values improve throughput but may hit API rate limits. Adjust based on API service quotas and network capacity.

headers pydantic-field

headers: dict[str, str]

HTTP headers to include in API requests. Use for authentication or custom headers required by your API service.

kind class-attribute

kind: Literal['api'] = 'api'

params pydantic-field

params: dict[str, Any]

Additional query parameters to include in API requests. Service-specific parameters for customizing API behavior beyond standard options.

picture_area_threshold pydantic-field

picture_area_threshold: float

Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.

prompt pydantic-field

prompt: str

Prompt template sent to the vision model for image description. Customize to guide the model's output style, detail level, or focus.

provenance pydantic-field

provenance: str

Provenance information to track the source or method of picture descriptions. Used for metadata and auditing purposes in the output document.

scale pydantic-field

scale: float

Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.

timeout pydantic-field

timeout: float

Maximum time in seconds to wait for API response before timing out. Increase for slow networks or complex image descriptions. Recommended: 10-60 seconds.

url pydantic-field

url: AnyUrl

API endpoint URL for picture description service. Must be OpenAI-compatible chat completions endpoint. Default points to local server; update for cloud services or custom deployments.

PictureDescriptionBaseOptions pydantic-model

Bases: BaseOptions

Base configuration for picture description models.

Show JSON schema:
{
  "$defs": {
    "PictureClassificationLabel": {
      "description": "PictureClassificationLabel.",
      "enum": [
        "other",
        "picture_group",
        "pie_chart",
        "bar_chart",
        "stacked_bar_chart",
        "line_chart",
        "flow_chart",
        "scatter_chart",
        "heatmap",
        "remote_sensing",
        "natural_image",
        "chemistry_molecular_structure",
        "chemistry_markush_structure",
        "icon",
        "logo",
        "signature",
        "stamp",
        "qr_code",
        "bar_code",
        "screenshot",
        "map",
        "stratigraphic_chart",
        "cad_drawing",
        "electrical_diagram"
      ],
      "title": "PictureClassificationLabel",
      "type": "string"
    }
  },
  "description": "Base configuration for picture description models.",
  "properties": {
    "batch_size": {
      "default": 8,
      "description": "Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.",
      "title": "Batch Size",
      "type": "integer"
    },
    "scale": {
      "default": 2.0,
      "description": "Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.",
      "title": "Scale",
      "type": "number"
    },
    "picture_area_threshold": {
      "default": 0.05,
      "description": "Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.",
      "title": "Picture Area Threshold",
      "type": "number"
    },
    "classification_allow": {
      "anyOf": [
        {
          "items": {
            "$ref": "#/$defs/PictureClassificationLabel"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).",
      "title": "Classification Allow"
    },
    "classification_deny": {
      "anyOf": [
        {
          "items": {
            "$ref": "#/$defs/PictureClassificationLabel"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).",
      "title": "Classification Deny"
    },
    "classification_min_confidence": {
      "default": 0.0,
      "description": "Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).",
      "title": "Classification Min Confidence",
      "type": "number"
    }
  },
  "title": "PictureDescriptionBaseOptions",
  "type": "object"
}

Fields:

batch_size pydantic-field

batch_size: int

Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.

classification_allow pydantic-field

classification_allow: Optional[list[PictureClassificationLabel]]

List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).

classification_deny pydantic-field

classification_deny: Optional[list[PictureClassificationLabel]]

List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).

classification_min_confidence pydantic-field

classification_min_confidence: float

Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).

kind class-attribute

kind: str

picture_area_threshold pydantic-field

picture_area_threshold: float

Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.

scale pydantic-field

scale: float

Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.

PictureDescriptionVlmOptions pydantic-model

Bases: PictureDescriptionBaseOptions

Configuration for inline vision-language models for picture description.

Show JSON schema:
{
  "$defs": {
    "PictureClassificationLabel": {
      "description": "PictureClassificationLabel.",
      "enum": [
        "other",
        "picture_group",
        "pie_chart",
        "bar_chart",
        "stacked_bar_chart",
        "line_chart",
        "flow_chart",
        "scatter_chart",
        "heatmap",
        "remote_sensing",
        "natural_image",
        "chemistry_molecular_structure",
        "chemistry_markush_structure",
        "icon",
        "logo",
        "signature",
        "stamp",
        "qr_code",
        "bar_code",
        "screenshot",
        "map",
        "stratigraphic_chart",
        "cad_drawing",
        "electrical_diagram"
      ],
      "title": "PictureClassificationLabel",
      "type": "string"
    }
  },
  "description": "Configuration for inline vision-language models for picture description.",
  "properties": {
    "batch_size": {
      "default": 8,
      "description": "Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.",
      "title": "Batch Size",
      "type": "integer"
    },
    "scale": {
      "default": 2.0,
      "description": "Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.",
      "title": "Scale",
      "type": "number"
    },
    "picture_area_threshold": {
      "default": 0.05,
      "description": "Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.",
      "title": "Picture Area Threshold",
      "type": "number"
    },
    "classification_allow": {
      "anyOf": [
        {
          "items": {
            "$ref": "#/$defs/PictureClassificationLabel"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).",
      "title": "Classification Allow"
    },
    "classification_deny": {
      "anyOf": [
        {
          "items": {
            "$ref": "#/$defs/PictureClassificationLabel"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).",
      "title": "Classification Deny"
    },
    "classification_min_confidence": {
      "default": 0.0,
      "description": "Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).",
      "title": "Classification Min Confidence",
      "type": "number"
    },
    "repo_id": {
      "description": "HuggingFace model repository ID for the vision-language model. Must be a model capable of image-to-text generation for picture descriptions.",
      "examples": [
        "HuggingFaceTB/SmolVLM-256M-Instruct",
        "ibm-granite/granite-vision-3.3-2b"
      ],
      "title": "Repo Id",
      "type": "string"
    },
    "prompt": {
      "default": "Describe this image in a few sentences.",
      "description": "Prompt template for the vision model. Customize to control description style, detail level, or focus.",
      "examples": [
        "What is shown in this image?",
        "Provide a detailed technical description"
      ],
      "title": "Prompt",
      "type": "string"
    },
    "generation_config": {
      "additionalProperties": true,
      "default": {
        "max_new_tokens": 200,
        "do_sample": false
      },
      "description": "HuggingFace generation configuration for text generation. Controls output length, sampling strategy, temperature, etc. See: https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig",
      "title": "Generation Config",
      "type": "object"
    }
  },
  "required": [
    "repo_id"
  ],
  "title": "PictureDescriptionVlmOptions",
  "type": "object"
}

Fields:

batch_size pydantic-field

batch_size: int

Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.

classification_allow pydantic-field

classification_allow: Optional[list[PictureClassificationLabel]]

List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).

classification_deny pydantic-field

classification_deny: Optional[list[PictureClassificationLabel]]

List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).

classification_min_confidence pydantic-field

classification_min_confidence: float

Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).

generation_config pydantic-field

generation_config: dict[str, Any]

HuggingFace generation configuration for text generation. Controls output length, sampling strategy, temperature, etc. See: https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig

kind class-attribute

kind: Literal['vlm'] = 'vlm'

picture_area_threshold pydantic-field

picture_area_threshold: float

Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.

prompt pydantic-field

prompt: str

Prompt template for the vision model. Customize to control description style, detail level, or focus.

repo_cache_folder property

repo_cache_folder: str

repo_id pydantic-field

repo_id: str

HuggingFace model repository ID for the vision-language model. Must be a model capable of image-to-text generation for picture descriptions.

scale pydantic-field

scale: float

Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.

PipelineOptions pydantic-model

Bases: BaseOptions

Base configuration for document processing pipelines.

Show JSON schema:
{
  "$defs": {
    "AcceleratorDevice": {
      "description": "Devices to run model inference",
      "enum": [
        "auto",
        "cpu",
        "cuda",
        "mps",
        "xpu"
      ],
      "title": "AcceleratorDevice",
      "type": "string"
    },
    "AcceleratorOptions": {
      "additionalProperties": false,
      "description": "Hardware acceleration configuration for model inference.\n\nCan be configured via environment variables with DOCLING_ prefix.",
      "properties": {
        "num_threads": {
          "default": 4,
          "description": "Number of CPU threads to use for model inference. Higher values can improve throughput on multi-core systems but may increase memory usage. Can be set via DOCLING_NUM_THREADS or OMP_NUM_THREADS environment variables. Recommended: number of physical CPU cores.",
          "title": "Num Threads",
          "type": "integer"
        },
        "device": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "$ref": "#/$defs/AcceleratorDevice"
            }
          ],
          "default": "auto",
          "description": "Hardware device for model inference. Options: `auto` (automatic detection), `cpu` (CPU only), `cuda` (NVIDIA GPU), `cuda:N` (specific GPU), `mps` (Apple Silicon), `xpu` (Intel GPU). Auto mode selects the best available device. Can be set via DOCLING_DEVICE environment variable.",
          "title": "Device"
        },
        "cuda_use_flash_attention2": {
          "default": false,
          "description": "Enable Flash Attention 2 optimization for CUDA devices. Provides significant speedup and memory reduction for transformer models on compatible NVIDIA GPUs (Ampere or newer). Requires flash-attn package installation. Can be set via DOCLING_CUDA_USE_FLASH_ATTENTION2 environment variable.",
          "title": "Cuda Use Flash Attention2",
          "type": "boolean"
        }
      },
      "title": "AcceleratorOptions",
      "type": "object"
    }
  },
  "description": "Base configuration for document processing pipelines.",
  "properties": {
    "document_timeout": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.",
      "examples": [
        10.0,
        20.0
      ],
      "title": "Document Timeout"
    },
    "accelerator_options": {
      "$ref": "#/$defs/AcceleratorOptions",
      "default": {
        "num_threads": 4,
        "device": "auto",
        "cuda_use_flash_attention2": false
      },
      "description": "Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models."
    },
    "enable_remote_services": {
      "default": false,
      "description": "Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.",
      "examples": [
        false
      ],
      "title": "Enable Remote Services",
      "type": "boolean"
    },
    "allow_external_plugins": {
      "default": false,
      "description": "Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.",
      "examples": [
        false
      ],
      "title": "Allow External Plugins",
      "type": "boolean"
    },
    "artifacts_path": {
      "anyOf": [
        {
          "format": "path",
          "type": "string"
        },
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use `docling-tools models download` to pre-fetch artifacts for offline operation or faster initialization.",
      "examples": [
        "./artifacts",
        "/tmp/docling_outputs"
      ],
      "title": "Artifacts Path"
    }
  },
  "title": "PipelineOptions",
  "type": "object"
}

Fields:

accelerator_options pydantic-field

accelerator_options: AcceleratorOptions

Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.

allow_external_plugins pydantic-field

allow_external_plugins: bool

Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.

artifacts_path pydantic-field

artifacts_path: Optional[Union[Path, str]]

Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

document_timeout pydantic-field

document_timeout: Optional[float]

Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.

enable_remote_services pydantic-field

enable_remote_services: bool

Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.

kind class-attribute

kind: str

ProcessingPipeline

Bases: str, Enum

Available document processing pipeline types for different use cases.

Each pipeline is optimized for specific document types and processing requirements. Select the appropriate pipeline based on your input format and desired output.

Attributes:

  • LEGACY

    Legacy pipeline for backward compatibility with older document processing workflows.

  • STANDARD

    Standard pipeline for general document processing (PDF, DOCX, images, etc.) with layout analysis.

  • VLM

    Vision-Language Model pipeline for advanced document understanding using multimodal AI models.

  • ASR

    Automatic Speech Recognition pipeline for audio and video transcription to text.

ASR class-attribute instance-attribute

ASR = 'asr'

LEGACY class-attribute instance-attribute

LEGACY = 'legacy'

STANDARD class-attribute instance-attribute

STANDARD = 'standard'

VLM class-attribute instance-attribute

VLM = 'vlm'

RapidOcrOptions pydantic-model

Bases: OcrOptions

Configuration for RapidOCR engine with multiple backend support.

See Also
  • https://rapidai.github.io/RapidOCRDocs/install_usage/api/RapidOCR/
  • https://rapidai.github.io/RapidOCRDocs/main/install_usage/rapidocr/usage/#__tabbed_3_4
Show JSON schema:
{
  "additionalProperties": false,
  "description": "Configuration for RapidOCR engine with multiple backend support.\n\nSee Also:\n    - https://rapidai.github.io/RapidOCRDocs/install_usage/api/RapidOCR/\n    - https://rapidai.github.io/RapidOCRDocs/main/install_usage/rapidocr/usage/#__tabbed_3_4",
  "properties": {
    "lang": {
      "default": [
        "english",
        "chinese"
      ],
      "description": "List of OCR languages. Note: RapidOCR does not currently support language selection; this parameter is reserved for future compatibility. See RapidOCR documentation for supported languages.",
      "items": {
        "type": "string"
      },
      "title": "Lang",
      "type": "array"
    },
    "force_full_page_ocr": {
      "default": false,
      "description": "If enabled, a full-page OCR is always applied.",
      "examples": [
        false
      ],
      "title": "Force Full Page Ocr",
      "type": "boolean"
    },
    "bitmap_area_threshold": {
      "default": 0.05,
      "description": "Percentage of the page area for a bitmap to be processed with OCR.",
      "examples": [
        0.05,
        0.1
      ],
      "title": "Bitmap Area Threshold",
      "type": "number"
    },
    "backend": {
      "default": "onnxruntime",
      "description": "Inference backend for RapidOCR. Options: `onnxruntime` (default, cross-platform), `openvino` (Intel), `paddle` (PaddlePaddle), `torch` (PyTorch). Choose based on your hardware and available libraries.",
      "enum": [
        "onnxruntime",
        "openvino",
        "paddle",
        "torch"
      ],
      "title": "Backend",
      "type": "string"
    },
    "text_score": {
      "default": 0.5,
      "description": "Minimum confidence score for text detection. Text regions with scores below this threshold are filtered out. Range: 0.0-1.0. Lower values detect more text but may include false positives.",
      "title": "Text Score",
      "type": "number"
    },
    "use_det": {
      "anyOf": [
        {
          "type": "boolean"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Enable text detection stage. If None, uses RapidOCR default behavior.",
      "title": "Use Det"
    },
    "use_cls": {
      "anyOf": [
        {
          "type": "boolean"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Enable text direction classification stage. If None, uses RapidOCR default behavior.",
      "title": "Use Cls"
    },
    "use_rec": {
      "anyOf": [
        {
          "type": "boolean"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Enable text recognition stage. If None, uses RapidOCR default behavior.",
      "title": "Use Rec"
    },
    "print_verbose": {
      "default": false,
      "description": "Enable verbose logging output from RapidOCR for debugging purposes.",
      "title": "Print Verbose",
      "type": "boolean"
    },
    "det_model_path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Custom path to text detection model. If None, uses default RapidOCR model.",
      "title": "Det Model Path"
    },
    "cls_model_path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Custom path to text classification model. If None, uses default RapidOCR model.",
      "title": "Cls Model Path"
    },
    "rec_model_path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Custom path to text recognition model. If None, uses default RapidOCR model.",
      "title": "Rec Model Path"
    },
    "rec_keys_path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Custom path to recognition keys file. If None, uses default RapidOCR keys.",
      "title": "Rec Keys Path"
    },
    "rec_font_path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "deprecated": true,
      "description": "Deprecated. Use font_path instead.",
      "title": "Rec Font Path"
    },
    "font_path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Custom path to font file for text rendering in visualization.",
      "title": "Font Path"
    },
    "rapidocr_params": {
      "additionalProperties": true,
      "default": {},
      "description": "Additional parameters to pass through to RapidOCR engine. Use this to override or extend default RapidOCR configuration with engine-specific options.",
      "title": "Rapidocr Params",
      "type": "object"
    }
  },
  "title": "RapidOcrOptions",
  "type": "object"
}

Config:

  • extra: forbid

Fields:

backend pydantic-field

backend: Literal['onnxruntime', 'openvino', 'paddle', 'torch']

Inference backend for RapidOCR. Options: onnxruntime (default, cross-platform), openvino (Intel), paddle (PaddlePaddle), torch (PyTorch). Choose based on your hardware and available libraries.

bitmap_area_threshold pydantic-field

bitmap_area_threshold: float

Percentage of the page area for a bitmap to be processed with OCR.

cls_model_path pydantic-field

cls_model_path: Optional[str]

Custom path to text classification model. If None, uses default RapidOCR model.

det_model_path pydantic-field

det_model_path: Optional[str]

Custom path to text detection model. If None, uses default RapidOCR model.

font_path pydantic-field

font_path: Optional[str]

Custom path to font file for text rendering in visualization.

force_full_page_ocr pydantic-field

force_full_page_ocr: bool

If enabled, a full-page OCR is always applied.

kind class-attribute

kind: Literal['rapidocr'] = 'rapidocr'

lang pydantic-field

lang: list[str]

List of OCR languages. Note: RapidOCR does not currently support language selection; this parameter is reserved for future compatibility. See RapidOCR documentation for supported languages.

model_config class-attribute instance-attribute

model_config = ConfigDict(extra='forbid')

print_verbose pydantic-field

print_verbose: bool

Enable verbose logging output from RapidOCR for debugging purposes.

rapidocr_params pydantic-field

rapidocr_params: dict[str, Any]

Additional parameters to pass through to RapidOCR engine. Use this to override or extend default RapidOCR configuration with engine-specific options.

rec_font_path pydantic-field

rec_font_path: Optional[str]

Deprecated. Use font_path instead.

rec_keys_path pydantic-field

rec_keys_path: Optional[str]

Custom path to recognition keys file. If None, uses default RapidOCR keys.

rec_model_path pydantic-field

rec_model_path: Optional[str]

Custom path to text recognition model. If None, uses default RapidOCR model.

text_score pydantic-field

text_score: float

Minimum confidence score for text detection. Text regions with scores below this threshold are filtered out. Range: 0.0-1.0. Lower values detect more text but may include false positives.

use_cls pydantic-field

use_cls: Optional[bool]

Enable text direction classification stage. If None, uses RapidOCR default behavior.

use_det pydantic-field

use_det: Optional[bool]

Enable text detection stage. If None, uses RapidOCR default behavior.

use_rec pydantic-field

use_rec: Optional[bool]

Enable text recognition stage. If None, uses RapidOCR default behavior.

TableFormerMode

Bases: str, Enum

Operating modes for TableFormer table structure extraction model.

Controls the trade-off between processing speed and extraction accuracy. Choose based on your performance requirements and document complexity.

Attributes:

  • FAST

    Fast mode prioritizes speed over precision. Suitable for simple tables or high-volume processing.

  • ACCURATE

    Accurate mode provides higher quality results with slower processing. Recommended for complex tables and production use.

ACCURATE class-attribute instance-attribute

ACCURATE = 'accurate'

FAST class-attribute instance-attribute

FAST = 'fast'

TableStructureOptions pydantic-model

Bases: BaseTableStructureOptions

Configuration for table structure extraction using the TableFormer model.

Show JSON schema:
{
  "$defs": {
    "TableFormerMode": {
      "description": "Operating modes for TableFormer table structure extraction model.\n\nControls the trade-off between processing speed and extraction accuracy.\nChoose based on your performance requirements and document complexity.\n\nAttributes:\n    FAST: Fast mode prioritizes speed over precision. Suitable for simple tables or high-volume\n        processing.\n    ACCURATE: Accurate mode provides higher quality results with slower processing. Recommended for complex\n        tables and production use.",
      "enum": [
        "fast",
        "accurate"
      ],
      "title": "TableFormerMode",
      "type": "string"
    }
  },
  "description": "Configuration for table structure extraction using the TableFormer model.",
  "properties": {
    "do_cell_matching": {
      "default": true,
      "description": "Enable cell matching to align detected table cells with their content. When enabled, the model attempts to match table structure predictions with actual cell content for improved accuracy.",
      "title": "Do Cell Matching",
      "type": "boolean"
    },
    "mode": {
      "$ref": "#/$defs/TableFormerMode",
      "default": "accurate",
      "description": "Table structure extraction mode. `accurate` provides higher quality results with slower processing, while `fast` prioritizes speed over precision. Recommended: `accurate` for production use."
    }
  },
  "title": "TableStructureOptions",
  "type": "object"
}

Fields:

do_cell_matching pydantic-field

do_cell_matching: bool

Enable cell matching to align detected table cells with their content. When enabled, the model attempts to match table structure predictions with actual cell content for improved accuracy.

kind class-attribute

kind: str = 'docling_tableformer'

mode pydantic-field

Table structure extraction mode. accurate provides higher quality results with slower processing, while fast prioritizes speed over precision. Recommended: accurate for production use.

TesseractCliOcrOptions pydantic-model

Bases: OcrOptions

Configuration for Tesseract OCR via command-line interface.

Show JSON schema:
{
  "additionalProperties": false,
  "description": "Configuration for Tesseract OCR via command-line interface.",
  "properties": {
    "lang": {
      "default": [
        "fra",
        "deu",
        "spa",
        "eng"
      ],
      "description": "List of Tesseract language codes. Use 3-letter ISO 639-2 codes (e.g., `eng`, `fra`, `deu`). Multiple languages enable multilingual OCR. Requires corresponding Tesseract language data files.",
      "items": {
        "type": "string"
      },
      "title": "Lang",
      "type": "array"
    },
    "force_full_page_ocr": {
      "default": false,
      "description": "If enabled, a full-page OCR is always applied.",
      "examples": [
        false
      ],
      "title": "Force Full Page Ocr",
      "type": "boolean"
    },
    "bitmap_area_threshold": {
      "default": 0.05,
      "description": "Percentage of the page area for a bitmap to be processed with OCR.",
      "examples": [
        0.05,
        0.1
      ],
      "title": "Bitmap Area Threshold",
      "type": "number"
    },
    "tesseract_cmd": {
      "default": "tesseract",
      "description": "Command or path to Tesseract executable. Use `tesseract` if in system PATH, or provide full path for custom installations (e.g., `/usr/local/bin/tesseract`).",
      "title": "Tesseract Cmd",
      "type": "string"
    },
    "path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Path to Tesseract data directory containing language files. If None, uses Tesseract's default TESSDATA_PREFIX location.",
      "title": "Path"
    },
    "psm": {
      "anyOf": [
        {
          "type": "integer"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Page Segmentation Mode for Tesseract. Values 0-13 control how Tesseract segments the page. Common values: 3 (auto), 6 (uniform block), 11 (sparse text). If None, uses Tesseract default.",
      "title": "Psm"
    }
  },
  "title": "TesseractCliOcrOptions",
  "type": "object"
}

Config:

  • extra: forbid

Fields:

bitmap_area_threshold pydantic-field

bitmap_area_threshold: float

Percentage of the page area for a bitmap to be processed with OCR.

force_full_page_ocr pydantic-field

force_full_page_ocr: bool

If enabled, a full-page OCR is always applied.

kind class-attribute

kind: Literal['tesseract'] = 'tesseract'

lang pydantic-field

lang: list[str]

List of Tesseract language codes. Use 3-letter ISO 639-2 codes (e.g., eng, fra, deu). Multiple languages enable multilingual OCR. Requires corresponding Tesseract language data files.

model_config class-attribute instance-attribute

model_config = ConfigDict(extra='forbid')

path pydantic-field

path: Optional[str]

Path to Tesseract data directory containing language files. If None, uses Tesseract's default TESSDATA_PREFIX location.

psm pydantic-field

psm: Optional[int]

Page Segmentation Mode for Tesseract. Values 0-13 control how Tesseract segments the page. Common values: 3 (auto), 6 (uniform block), 11 (sparse text). If None, uses Tesseract default.

tesseract_cmd pydantic-field

tesseract_cmd: str

Command or path to Tesseract executable. Use tesseract if in system PATH, or provide full path for custom installations (e.g., /usr/local/bin/tesseract).

TesseractOcrOptions pydantic-model

Bases: OcrOptions

Configuration for Tesseract OCR via Python bindings (tesserocr).

Show JSON schema:
{
  "additionalProperties": false,
  "description": "Configuration for Tesseract OCR via Python bindings (tesserocr).",
  "properties": {
    "lang": {
      "default": [
        "fra",
        "deu",
        "spa",
        "eng"
      ],
      "description": "List of Tesseract language codes. Use 3-letter ISO 639-2 codes (e.g., `eng`, `fra`, `deu`). Multiple languages enable multilingual OCR. Requires corresponding Tesseract language data files.",
      "items": {
        "type": "string"
      },
      "title": "Lang",
      "type": "array"
    },
    "force_full_page_ocr": {
      "default": false,
      "description": "If enabled, a full-page OCR is always applied.",
      "examples": [
        false
      ],
      "title": "Force Full Page Ocr",
      "type": "boolean"
    },
    "bitmap_area_threshold": {
      "default": 0.05,
      "description": "Percentage of the page area for a bitmap to be processed with OCR.",
      "examples": [
        0.05,
        0.1
      ],
      "title": "Bitmap Area Threshold",
      "type": "number"
    },
    "path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Path to Tesseract data directory containing language files. If None, uses Tesseract's default TESSDATA_PREFIX location.",
      "title": "Path"
    },
    "psm": {
      "anyOf": [
        {
          "type": "integer"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Page Segmentation Mode for Tesseract. Values 0-13 control how Tesseract segments the page. Common values: 3 (auto), 6 (uniform block), 11 (sparse text). If None, uses Tesseract default.",
      "title": "Psm"
    }
  },
  "title": "TesseractOcrOptions",
  "type": "object"
}

Config:

  • extra: forbid

Fields:

bitmap_area_threshold pydantic-field

bitmap_area_threshold: float

Percentage of the page area for a bitmap to be processed with OCR.

force_full_page_ocr pydantic-field

force_full_page_ocr: bool

If enabled, a full-page OCR is always applied.

kind class-attribute

kind: Literal['tesserocr'] = 'tesserocr'

lang pydantic-field

lang: list[str]

List of Tesseract language codes. Use 3-letter ISO 639-2 codes (e.g., eng, fra, deu). Multiple languages enable multilingual OCR. Requires corresponding Tesseract language data files.

model_config class-attribute instance-attribute

model_config = ConfigDict(extra='forbid')

path pydantic-field

path: Optional[str]

Path to Tesseract data directory containing language files. If None, uses Tesseract's default TESSDATA_PREFIX location.

psm pydantic-field

psm: Optional[int]

Page Segmentation Mode for Tesseract. Values 0-13 control how Tesseract segments the page. Common values: 3 (auto), 6 (uniform block), 11 (sparse text). If None, uses Tesseract default.

ThreadedPdfPipelineOptions pydantic-model

Bases: PdfPipelineOptions

Pipeline options for the threaded PDF pipeline with batching and backpressure control

Show JSON schema:
{
  "$defs": {
    "AcceleratorDevice": {
      "description": "Devices to run model inference",
      "enum": [
        "auto",
        "cpu",
        "cuda",
        "mps",
        "xpu"
      ],
      "title": "AcceleratorDevice",
      "type": "string"
    },
    "AcceleratorOptions": {
      "additionalProperties": false,
      "description": "Hardware acceleration configuration for model inference.\n\nCan be configured via environment variables with DOCLING_ prefix.",
      "properties": {
        "num_threads": {
          "default": 4,
          "description": "Number of CPU threads to use for model inference. Higher values can improve throughput on multi-core systems but may increase memory usage. Can be set via DOCLING_NUM_THREADS or OMP_NUM_THREADS environment variables. Recommended: number of physical CPU cores.",
          "title": "Num Threads",
          "type": "integer"
        },
        "device": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "$ref": "#/$defs/AcceleratorDevice"
            }
          ],
          "default": "auto",
          "description": "Hardware device for model inference. Options: `auto` (automatic detection), `cpu` (CPU only), `cuda` (NVIDIA GPU), `cuda:N` (specific GPU), `mps` (Apple Silicon), `xpu` (Intel GPU). Auto mode selects the best available device. Can be set via DOCLING_DEVICE environment variable.",
          "title": "Device"
        },
        "cuda_use_flash_attention2": {
          "default": false,
          "description": "Enable Flash Attention 2 optimization for CUDA devices. Provides significant speedup and memory reduction for transformer models on compatible NVIDIA GPUs (Ampere or newer). Requires flash-attn package installation. Can be set via DOCLING_CUDA_USE_FLASH_ATTENTION2 environment variable.",
          "title": "Cuda Use Flash Attention2",
          "type": "boolean"
        }
      },
      "title": "AcceleratorOptions",
      "type": "object"
    },
    "BaseLayoutOptions": {
      "description": "Base options for layout models.",
      "properties": {
        "keep_empty_clusters": {
          "default": false,
          "description": "Retain empty clusters in layout analysis results. When False, clusters without content are removed. Enable for debugging or when empty regions are semantically important.",
          "title": "Keep Empty Clusters",
          "type": "boolean"
        },
        "skip_cell_assignment": {
          "default": false,
          "description": "Skip assignment of cells to table structures during layout analysis. When True, cells are detected but not associated with tables. Use for performance optimization when table structure is not needed.",
          "title": "Skip Cell Assignment",
          "type": "boolean"
        }
      },
      "title": "BaseLayoutOptions",
      "type": "object"
    },
    "BaseTableStructureOptions": {
      "description": "Base options for table structure models.",
      "properties": {},
      "title": "BaseTableStructureOptions",
      "type": "object"
    },
    "OcrOptions": {
      "description": "OCR options.",
      "properties": {
        "lang": {
          "description": "List of OCR languages to use. The format must match the values of the OCR engine of choice.",
          "examples": [
            [
              "deu",
              "eng"
            ]
          ],
          "items": {
            "type": "string"
          },
          "title": "Lang",
          "type": "array"
        },
        "force_full_page_ocr": {
          "default": false,
          "description": "If enabled, a full-page OCR is always applied.",
          "examples": [
            false
          ],
          "title": "Force Full Page Ocr",
          "type": "boolean"
        },
        "bitmap_area_threshold": {
          "default": 0.05,
          "description": "Percentage of the page area for a bitmap to be processed with OCR.",
          "examples": [
            0.05,
            0.1
          ],
          "title": "Bitmap Area Threshold",
          "type": "number"
        }
      },
      "required": [
        "lang"
      ],
      "title": "OcrOptions",
      "type": "object"
    },
    "PictureClassificationLabel": {
      "description": "PictureClassificationLabel.",
      "enum": [
        "other",
        "picture_group",
        "pie_chart",
        "bar_chart",
        "stacked_bar_chart",
        "line_chart",
        "flow_chart",
        "scatter_chart",
        "heatmap",
        "remote_sensing",
        "natural_image",
        "chemistry_molecular_structure",
        "chemistry_markush_structure",
        "icon",
        "logo",
        "signature",
        "stamp",
        "qr_code",
        "bar_code",
        "screenshot",
        "map",
        "stratigraphic_chart",
        "cad_drawing",
        "electrical_diagram"
      ],
      "title": "PictureClassificationLabel",
      "type": "string"
    },
    "PictureDescriptionBaseOptions": {
      "description": "Base configuration for picture description models.",
      "properties": {
        "batch_size": {
          "default": 8,
          "description": "Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.",
          "title": "Batch Size",
          "type": "integer"
        },
        "scale": {
          "default": 2.0,
          "description": "Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.",
          "title": "Scale",
          "type": "number"
        },
        "picture_area_threshold": {
          "default": 0.05,
          "description": "Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.",
          "title": "Picture Area Threshold",
          "type": "number"
        },
        "classification_allow": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).",
          "title": "Classification Allow"
        },
        "classification_deny": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).",
          "title": "Classification Deny"
        },
        "classification_min_confidence": {
          "default": 0.0,
          "description": "Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).",
          "title": "Classification Min Confidence",
          "type": "number"
        }
      },
      "title": "PictureDescriptionBaseOptions",
      "type": "object"
    }
  },
  "description": "Pipeline options for the threaded PDF pipeline with batching and backpressure control",
  "properties": {
    "document_timeout": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.",
      "examples": [
        10.0,
        20.0
      ],
      "title": "Document Timeout"
    },
    "accelerator_options": {
      "$ref": "#/$defs/AcceleratorOptions",
      "default": {
        "num_threads": 4,
        "device": "auto",
        "cuda_use_flash_attention2": false
      },
      "description": "Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models."
    },
    "enable_remote_services": {
      "default": false,
      "description": "Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.",
      "examples": [
        false
      ],
      "title": "Enable Remote Services",
      "type": "boolean"
    },
    "allow_external_plugins": {
      "default": false,
      "description": "Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.",
      "examples": [
        false
      ],
      "title": "Allow External Plugins",
      "type": "boolean"
    },
    "artifacts_path": {
      "anyOf": [
        {
          "format": "path",
          "type": "string"
        },
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use `docling-tools models download` to pre-fetch artifacts for offline operation or faster initialization.",
      "examples": [
        "./artifacts",
        "/tmp/docling_outputs"
      ],
      "title": "Artifacts Path"
    },
    "do_picture_classification": {
      "default": false,
      "description": "Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.",
      "title": "Do Picture Classification",
      "type": "boolean"
    },
    "do_picture_description": {
      "default": false,
      "description": "Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.",
      "title": "Do Picture Description",
      "type": "boolean"
    },
    "picture_description_options": {
      "$ref": "#/$defs/PictureDescriptionBaseOptions",
      "default": {
        "batch_size": 8,
        "scale": 2.0,
        "picture_area_threshold": 0.05,
        "classification_allow": null,
        "classification_deny": null,
        "classification_min_confidence": 0.0,
        "repo_id": "HuggingFaceTB/SmolVLM-256M-Instruct",
        "prompt": "Describe this image in a few sentences.",
        "generation_config": {
          "do_sample": false,
          "max_new_tokens": 200
        }
      },
      "description": "Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when `do_picture_description=True`."
    },
    "images_scale": {
      "default": 1.0,
      "description": "Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements. Recommended values: 1.0 (standard quality), 2.0 (high resolution), 0.5 (lower resolution for previews).",
      "title": "Images Scale",
      "type": "number"
    },
    "generate_page_images": {
      "default": false,
      "description": "Generate rendered page images during extraction. Creates PNG representations of each page for visual preview, validation, or downstream image-based machine learning tasks.",
      "title": "Generate Page Images",
      "type": "boolean"
    },
    "generate_picture_images": {
      "default": false,
      "description": "Extract and save embedded images from the PDF. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.",
      "title": "Generate Picture Images",
      "type": "boolean"
    },
    "do_table_structure": {
      "default": true,
      "description": "Enable table structure extraction and reconstruction. Detects table regions, extracts cell content with row/column relationships, and reconstructs the logical table structure for downstream processing.",
      "title": "Do Table Structure",
      "type": "boolean"
    },
    "do_ocr": {
      "default": true,
      "description": "Enable Optical Character Recognition for scanned or image-based PDFs. Replaces or supplements programmatic text extraction with OCR-detected text. Required for scanned documents with no embedded text layer. Note: OCR significantly increases processing time.",
      "title": "Do Ocr",
      "type": "boolean"
    },
    "do_code_enrichment": {
      "default": false,
      "description": "Enable specialized processing for code blocks. Applies code-aware OCR and formatting to improve accuracy of programming language snippets, terminal output, and structured code content.",
      "title": "Do Code Enrichment",
      "type": "boolean"
    },
    "do_formula_enrichment": {
      "default": false,
      "description": "Enable mathematical formula recognition and LaTeX conversion. Uses specialized models to detect and extract mathematical expressions, converting them to LaTeX format for accurate representation.",
      "title": "Do Formula Enrichment",
      "type": "boolean"
    },
    "force_backend_text": {
      "default": false,
      "description": "Force use of PDF backend's native text extraction instead of layout model predictions. When enabled, bypasses the layout model's text detection and uses the embedded text from the PDF file directly. Useful for PDFs with reliable programmatic text layers.",
      "title": "Force Backend Text",
      "type": "boolean"
    },
    "table_structure_options": {
      "$ref": "#/$defs/BaseTableStructureOptions",
      "default": {
        "do_cell_matching": true,
        "mode": "accurate"
      },
      "description": "Configuration for table structure extraction. Controls table detection accuracy, cell matching behavior, and table formatting. Only applicable when `do_table_structure=True`."
    },
    "ocr_options": {
      "$ref": "#/$defs/OcrOptions",
      "default": {
        "lang": [],
        "force_full_page_ocr": false,
        "bitmap_area_threshold": 0.05
      },
      "description": "Configuration for OCR engine. Specifies which OCR engine to use (Tesseract, EasyOCR, RapidOCR, etc.) and engine-specific settings. Only applicable when `do_ocr=True`."
    },
    "layout_options": {
      "$ref": "#/$defs/BaseLayoutOptions",
      "default": {
        "keep_empty_clusters": false,
        "skip_cell_assignment": false,
        "create_orphan_clusters": true,
        "model_spec": {
          "model_path": "",
          "name": "docling_layout_heron",
          "repo_id": "docling-project/docling-layout-heron",
          "revision": "main",
          "supported_devices": [
            "cpu",
            "cuda",
            "mps",
            "xpu"
          ]
        }
      },
      "description": "Configuration for document layout analysis model. Controls layout detection behavior including cluster creation for orphaned elements, cell assignment to table structures, and handling of empty regions. Specifies which layout model to use (default: Heron)."
    },
    "generate_table_images": {
      "default": false,
      "deprecated": true,
      "title": "Generate Table Images",
      "type": "boolean"
    },
    "generate_parsed_pages": {
      "default": false,
      "description": "Retain intermediate parsed page representations after processing. When enabled, keeps detailed page-level parsing data structures for debugging or advanced post-processing. Increases memory usage. Automatically disabled after document assembly unless explicitly enabled.",
      "title": "Generate Parsed Pages",
      "type": "boolean"
    },
    "ocr_batch_size": {
      "default": 4,
      "description": "Batch size for OCR processing stage in threaded pipeline. Pages are grouped and processed together to improve throughput. Higher values increase GPU/CPU utilization but require more memory. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Ocr Batch Size",
      "type": "integer"
    },
    "layout_batch_size": {
      "default": 4,
      "description": "Batch size for layout analysis stage in threaded pipeline. Pages are grouped and processed together by the layout model. Higher values improve throughput but increase memory usage. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Layout Batch Size",
      "type": "integer"
    },
    "table_batch_size": {
      "default": 4,
      "description": "Batch size for table structure extraction stage in threaded pipeline. Tables from multiple pages are processed together. Higher values improve throughput but increase memory usage. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Table Batch Size",
      "type": "integer"
    },
    "batch_polling_interval_seconds": {
      "default": 0.5,
      "description": "Polling interval in seconds for batch collection in threaded pipeline stages. Each stage waits up to this duration to accumulate items before processing. Lower values reduce latency but may decrease batching efficiency. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Batch Polling Interval Seconds",
      "type": "number"
    },
    "queue_max_size": {
      "default": 100,
      "description": "Maximum queue size for inter-stage communication in threaded pipeline. Limits the number of items buffered between processing stages to prevent memory overflow. When full, upstream stages block until space is available. Only used by `StandardPdfPipeline` (threaded mode).",
      "title": "Queue Max Size",
      "type": "integer"
    }
  },
  "title": "ThreadedPdfPipelineOptions",
  "type": "object"
}

Fields:

accelerator_options pydantic-field

accelerator_options: AcceleratorOptions

Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.

allow_external_plugins pydantic-field

allow_external_plugins: bool

Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.

artifacts_path pydantic-field

artifacts_path: Optional[Union[Path, str]]

Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

batch_polling_interval_seconds pydantic-field

batch_polling_interval_seconds: float

Polling interval in seconds for batch collection in threaded pipeline stages. Each stage waits up to this duration to accumulate items before processing. Lower values reduce latency but may decrease batching efficiency. Only used by StandardPdfPipeline (threaded mode).

do_code_enrichment pydantic-field

do_code_enrichment: bool

Enable specialized processing for code blocks. Applies code-aware OCR and formatting to improve accuracy of programming language snippets, terminal output, and structured code content.

do_formula_enrichment pydantic-field

do_formula_enrichment: bool

Enable mathematical formula recognition and LaTeX conversion. Uses specialized models to detect and extract mathematical expressions, converting them to LaTeX format for accurate representation.

do_ocr pydantic-field

do_ocr: bool

Enable Optical Character Recognition for scanned or image-based PDFs. Replaces or supplements programmatic text extraction with OCR-detected text. Required for scanned documents with no embedded text layer. Note: OCR significantly increases processing time.

do_picture_classification pydantic-field

do_picture_classification: bool

Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.

do_picture_description pydantic-field

do_picture_description: bool

Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.

do_table_structure pydantic-field

do_table_structure: bool

Enable table structure extraction and reconstruction. Detects table regions, extracts cell content with row/column relationships, and reconstructs the logical table structure for downstream processing.

document_timeout pydantic-field

document_timeout: Optional[float]

Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.

enable_remote_services pydantic-field

enable_remote_services: bool

Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.

force_backend_text pydantic-field

force_backend_text: bool

Force use of PDF backend's native text extraction instead of layout model predictions. When enabled, bypasses the layout model's text detection and uses the embedded text from the PDF file directly. Useful for PDFs with reliable programmatic text layers.

generate_page_images pydantic-field

generate_page_images: bool

Generate rendered page images during extraction. Creates PNG representations of each page for visual preview, validation, or downstream image-based machine learning tasks.

generate_parsed_pages pydantic-field

generate_parsed_pages: bool

Retain intermediate parsed page representations after processing. When enabled, keeps detailed page-level parsing data structures for debugging or advanced post-processing. Increases memory usage. Automatically disabled after document assembly unless explicitly enabled.

generate_picture_images pydantic-field

generate_picture_images: bool

Extract and save embedded images from the PDF. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.

generate_table_images pydantic-field

generate_table_images: bool

images_scale pydantic-field

images_scale: float

Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements. Recommended values: 1.0 (standard quality), 2.0 (high resolution), 0.5 (lower resolution for previews).

kind class-attribute

kind: str

layout_batch_size pydantic-field

layout_batch_size: int

Batch size for layout analysis stage in threaded pipeline. Pages are grouped and processed together by the layout model. Higher values improve throughput but increase memory usage. Only used by StandardPdfPipeline (threaded mode).

layout_options pydantic-field

layout_options: BaseLayoutOptions

Configuration for document layout analysis model. Controls layout detection behavior including cluster creation for orphaned elements, cell assignment to table structures, and handling of empty regions. Specifies which layout model to use (default: Heron).

ocr_batch_size pydantic-field

ocr_batch_size: int

Batch size for OCR processing stage in threaded pipeline. Pages are grouped and processed together to improve throughput. Higher values increase GPU/CPU utilization but require more memory. Only used by StandardPdfPipeline (threaded mode).

ocr_options pydantic-field

ocr_options: OcrOptions

Configuration for OCR engine. Specifies which OCR engine to use (Tesseract, EasyOCR, RapidOCR, etc.) and engine-specific settings. Only applicable when do_ocr=True.

picture_description_options pydantic-field

picture_description_options: PictureDescriptionBaseOptions

Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when do_picture_description=True.

queue_max_size pydantic-field

queue_max_size: int

Maximum queue size for inter-stage communication in threaded pipeline. Limits the number of items buffered between processing stages to prevent memory overflow. When full, upstream stages block until space is available. Only used by StandardPdfPipeline (threaded mode).

table_batch_size pydantic-field

table_batch_size: int

Batch size for table structure extraction stage in threaded pipeline. Tables from multiple pages are processed together. Higher values improve throughput but increase memory usage. Only used by StandardPdfPipeline (threaded mode).

table_structure_options pydantic-field

table_structure_options: BaseTableStructureOptions

Configuration for table structure extraction. Controls table detection accuracy, cell matching behavior, and table formatting. Only applicable when do_table_structure=True.

VlmExtractionPipelineOptions pydantic-model

Bases: PipelineOptions

Options for extraction pipeline.

Show JSON schema:
{
  "$defs": {
    "AcceleratorDevice": {
      "description": "Devices to run model inference",
      "enum": [
        "auto",
        "cpu",
        "cuda",
        "mps",
        "xpu"
      ],
      "title": "AcceleratorDevice",
      "type": "string"
    },
    "AcceleratorOptions": {
      "additionalProperties": false,
      "description": "Hardware acceleration configuration for model inference.\n\nCan be configured via environment variables with DOCLING_ prefix.",
      "properties": {
        "num_threads": {
          "default": 4,
          "description": "Number of CPU threads to use for model inference. Higher values can improve throughput on multi-core systems but may increase memory usage. Can be set via DOCLING_NUM_THREADS or OMP_NUM_THREADS environment variables. Recommended: number of physical CPU cores.",
          "title": "Num Threads",
          "type": "integer"
        },
        "device": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "$ref": "#/$defs/AcceleratorDevice"
            }
          ],
          "default": "auto",
          "description": "Hardware device for model inference. Options: `auto` (automatic detection), `cpu` (CPU only), `cuda` (NVIDIA GPU), `cuda:N` (specific GPU), `mps` (Apple Silicon), `xpu` (Intel GPU). Auto mode selects the best available device. Can be set via DOCLING_DEVICE environment variable.",
          "title": "Device"
        },
        "cuda_use_flash_attention2": {
          "default": false,
          "description": "Enable Flash Attention 2 optimization for CUDA devices. Provides significant speedup and memory reduction for transformer models on compatible NVIDIA GPUs (Ampere or newer). Requires flash-attn package installation. Can be set via DOCLING_CUDA_USE_FLASH_ATTENTION2 environment variable.",
          "title": "Cuda Use Flash Attention2",
          "type": "boolean"
        }
      },
      "title": "AcceleratorOptions",
      "type": "object"
    },
    "InferenceFramework": {
      "enum": [
        "mlx",
        "transformers",
        "vllm"
      ],
      "title": "InferenceFramework",
      "type": "string"
    },
    "InlineVlmOptions": {
      "description": "Configuration for inline vision-language models running locally.",
      "properties": {
        "kind": {
          "const": "inline_model_options",
          "default": "inline_model_options",
          "title": "Kind",
          "type": "string"
        },
        "prompt": {
          "description": "Prompt template for the vision-language model. Guides the model's output format and content focus.",
          "title": "Prompt",
          "type": "string"
        },
        "scale": {
          "default": 2.0,
          "description": "Scaling factor for image resolution before processing. Higher values provide more detail but increase processing time and memory usage. Range: 0.5-4.0 typical.",
          "title": "Scale",
          "type": "number"
        },
        "max_size": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Maximum image dimension (width or height) in pixels. Images larger than this are resized while maintaining aspect ratio. If None, no size limit is enforced.",
          "title": "Max Size"
        },
        "temperature": {
          "default": 0.0,
          "description": "Sampling temperature for text generation. 0.0 uses greedy decoding (deterministic), higher values (e.g., 0.7-1.0) increase randomness. Recommended: 0.0 for consistent outputs.",
          "title": "Temperature",
          "type": "number"
        },
        "repo_id": {
          "description": "HuggingFace model repository ID for the vision-language model. Must be a model capable of processing images and generating text.",
          "examples": [
            "Qwen/Qwen2-VL-2B-Instruct",
            "ibm-granite/granite-vision-3.3-2b"
          ],
          "title": "Repo Id",
          "type": "string"
        },
        "revision": {
          "default": "main",
          "description": "Git revision (branch, tag, or commit hash) of the model repository. Allows pinning to specific model versions for reproducibility.",
          "examples": [
            "main",
            "v1.0.0"
          ],
          "title": "Revision",
          "type": "string"
        },
        "trust_remote_code": {
          "default": false,
          "description": "Allow execution of custom code from the model repository. Required for some models with custom architectures. Enable only for trusted sources due to security implications.",
          "title": "Trust Remote Code",
          "type": "boolean"
        },
        "load_in_8bit": {
          "default": true,
          "description": "Load model weights in 8-bit precision using bitsandbytes quantization. Reduces memory usage by ~50% with minimal accuracy loss. Requires bitsandbytes library and CUDA.",
          "title": "Load In 8Bit",
          "type": "boolean"
        },
        "llm_int8_threshold": {
          "default": 6.0,
          "description": "Threshold for LLM.int8() quantization outlier detection. Values with magnitude above this threshold are kept in float16 for accuracy. Lower values increase quantization but may reduce quality.",
          "title": "Llm Int8 Threshold",
          "type": "number"
        },
        "quantized": {
          "default": false,
          "description": "Indicates if the model is pre-quantized (e.g., GGUF, AWQ). When True, skips runtime quantization. Use for models already quantized during training or conversion.",
          "title": "Quantized",
          "type": "boolean"
        },
        "inference_framework": {
          "$ref": "#/$defs/InferenceFramework",
          "description": "Inference framework for running the VLM. Options: `transformers` (HuggingFace), `mlx` (Apple Silicon), `vllm` (high-throughput serving)."
        },
        "transformers_model_type": {
          "$ref": "#/$defs/TransformersModelType",
          "default": "automodel",
          "description": "HuggingFace Transformers model class to use. Options: `automodel` (auto-detect), `automodel-vision2seq` (vision-to-sequence), `automodel-causallm` (causal LM), `automodel-imagetexttotext` (image+text to text)."
        },
        "transformers_prompt_style": {
          "$ref": "#/$defs/TransformersPromptStyle",
          "default": "chat",
          "description": "Prompt formatting style for Transformers models. Options: `chat` (chat template), `raw` (raw text), `none` (no formatting). Use `chat` for instruction-tuned models."
        },
        "response_format": {
          "$ref": "#/$defs/ResponseFormat",
          "description": "Expected output format from the VLM. Options: `doctags` (structured tags), `markdown`, `html`, `otsl` (table structure), `plaintext`. Guides model output parsing."
        },
        "torch_dtype": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "PyTorch data type for model weights. Options: `float32`, `float16`, `bfloat16`. Lower precision reduces memory and increases speed. If None, uses model default.",
          "title": "Torch Dtype"
        },
        "supported_devices": {
          "default": [
            "cpu",
            "cuda",
            "mps",
            "xpu"
          ],
          "description": "List of hardware accelerators supported by this VLM configuration.",
          "items": {
            "$ref": "#/$defs/AcceleratorDevice"
          },
          "title": "Supported Devices",
          "type": "array"
        },
        "stop_strings": {
          "default": [],
          "description": "List of strings that trigger generation stopping when encountered. Used to prevent the model from generating beyond desired output boundaries.",
          "items": {
            "type": "string"
          },
          "title": "Stop Strings",
          "type": "array"
        },
        "custom_stopping_criteria": {
          "default": [],
          "description": "Custom stopping criteria objects for fine-grained control over generation termination. Allows implementing complex stopping logic beyond simple string matching.",
          "items": {
            "anyOf": []
          },
          "title": "Custom Stopping Criteria",
          "type": "array"
        },
        "extra_generation_config": {
          "additionalProperties": true,
          "default": {},
          "description": "Additional generation configuration parameters passed to the model. Overrides or extends default generation settings (e.g., top_p, top_k, repetition_penalty).",
          "title": "Extra Generation Config",
          "type": "object"
        },
        "extra_processor_kwargs": {
          "additionalProperties": true,
          "default": {},
          "description": "Additional keyword arguments passed to the image processor. Used for model-specific preprocessing options not covered by standard parameters.",
          "title": "Extra Processor Kwargs",
          "type": "object"
        },
        "use_kv_cache": {
          "default": true,
          "description": "Enable key-value caching for transformer attention. Significantly speeds up generation by caching attention computations. Disable only for debugging or memory-constrained scenarios.",
          "title": "Use Kv Cache",
          "type": "boolean"
        },
        "max_new_tokens": {
          "default": 4096,
          "description": "Maximum number of tokens to generate. Limits output length to prevent runaway generation. Adjust based on expected output size and memory constraints.",
          "title": "Max New Tokens",
          "type": "integer"
        },
        "track_generated_tokens": {
          "default": false,
          "description": "Track and store generated tokens during inference. Useful for debugging, analysis, or implementing custom post-processing. Increases memory usage.",
          "title": "Track Generated Tokens",
          "type": "boolean"
        },
        "track_input_prompt": {
          "default": false,
          "description": "Track and store the input prompt sent to the model. Useful for debugging, logging, or auditing. May contain sensitive information.",
          "title": "Track Input Prompt",
          "type": "boolean"
        }
      },
      "required": [
        "prompt",
        "repo_id",
        "inference_framework",
        "response_format"
      ],
      "title": "InlineVlmOptions",
      "type": "object"
    },
    "ResponseFormat": {
      "enum": [
        "doctags",
        "markdown",
        "deepseekocr_markdown",
        "html",
        "otsl",
        "plaintext"
      ],
      "title": "ResponseFormat",
      "type": "string"
    },
    "TransformersModelType": {
      "enum": [
        "automodel",
        "automodel-vision2seq",
        "automodel-causallm",
        "automodel-imagetexttotext"
      ],
      "title": "TransformersModelType",
      "type": "string"
    },
    "TransformersPromptStyle": {
      "enum": [
        "chat",
        "raw",
        "none"
      ],
      "title": "TransformersPromptStyle",
      "type": "string"
    }
  },
  "description": "Options for extraction pipeline.",
  "properties": {
    "document_timeout": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.",
      "examples": [
        10.0,
        20.0
      ],
      "title": "Document Timeout"
    },
    "accelerator_options": {
      "$ref": "#/$defs/AcceleratorOptions",
      "default": {
        "num_threads": 4,
        "device": "auto",
        "cuda_use_flash_attention2": false
      },
      "description": "Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models."
    },
    "enable_remote_services": {
      "default": false,
      "description": "Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.",
      "examples": [
        false
      ],
      "title": "Enable Remote Services",
      "type": "boolean"
    },
    "allow_external_plugins": {
      "default": false,
      "description": "Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.",
      "examples": [
        false
      ],
      "title": "Allow External Plugins",
      "type": "boolean"
    },
    "artifacts_path": {
      "anyOf": [
        {
          "format": "path",
          "type": "string"
        },
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use `docling-tools models download` to pre-fetch artifacts for offline operation or faster initialization.",
      "examples": [
        "./artifacts",
        "/tmp/docling_outputs"
      ],
      "title": "Artifacts Path"
    },
    "vlm_options": {
      "$ref": "#/$defs/InlineVlmOptions",
      "default": {
        "kind": "inline_model_options",
        "prompt": "",
        "scale": 2.0,
        "max_size": null,
        "temperature": 0.0,
        "repo_id": "numind/NuExtract-2.0-2B",
        "revision": "fe5b2f0b63b81150721435a3ca1129a75c59c74e",
        "trust_remote_code": false,
        "load_in_8bit": true,
        "llm_int8_threshold": 6.0,
        "quantized": false,
        "inference_framework": "transformers",
        "transformers_model_type": "automodel-imagetexttotext",
        "transformers_prompt_style": "chat",
        "response_format": "plaintext",
        "torch_dtype": "bfloat16",
        "supported_devices": [
          "cpu",
          "cuda",
          "mps",
          "xpu"
        ],
        "stop_strings": [],
        "custom_stopping_criteria": [],
        "extra_generation_config": {},
        "extra_processor_kwargs": {},
        "use_kv_cache": true,
        "max_new_tokens": 4096,
        "track_generated_tokens": false,
        "track_input_prompt": false
      },
      "description": "Vision-Language Model (VLM) configuration for structured information extraction. Specifies which VLM to use and its parameters for extracting structured data from documents using vision models."
    }
  },
  "title": "VlmExtractionPipelineOptions",
  "type": "object"
}

Fields:

accelerator_options pydantic-field

accelerator_options: AcceleratorOptions

Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.

allow_external_plugins pydantic-field

allow_external_plugins: bool

Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.

artifacts_path pydantic-field

artifacts_path: Optional[Union[Path, str]]

Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

document_timeout pydantic-field

document_timeout: Optional[float]

Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.

enable_remote_services pydantic-field

enable_remote_services: bool

Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.

kind class-attribute

kind: str

vlm_options pydantic-field

vlm_options: InlineVlmOptions

Vision-Language Model (VLM) configuration for structured information extraction. Specifies which VLM to use and its parameters for extracting structured data from documents using vision models.

VlmPipelineOptions pydantic-model

Bases: PaginatedPipelineOptions

Pipeline configuration for vision-language model based document processing.

Show JSON schema:
{
  "$defs": {
    "AcceleratorDevice": {
      "description": "Devices to run model inference",
      "enum": [
        "auto",
        "cpu",
        "cuda",
        "mps",
        "xpu"
      ],
      "title": "AcceleratorDevice",
      "type": "string"
    },
    "AcceleratorOptions": {
      "additionalProperties": false,
      "description": "Hardware acceleration configuration for model inference.\n\nCan be configured via environment variables with DOCLING_ prefix.",
      "properties": {
        "num_threads": {
          "default": 4,
          "description": "Number of CPU threads to use for model inference. Higher values can improve throughput on multi-core systems but may increase memory usage. Can be set via DOCLING_NUM_THREADS or OMP_NUM_THREADS environment variables. Recommended: number of physical CPU cores.",
          "title": "Num Threads",
          "type": "integer"
        },
        "device": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "$ref": "#/$defs/AcceleratorDevice"
            }
          ],
          "default": "auto",
          "description": "Hardware device for model inference. Options: `auto` (automatic detection), `cpu` (CPU only), `cuda` (NVIDIA GPU), `cuda:N` (specific GPU), `mps` (Apple Silicon), `xpu` (Intel GPU). Auto mode selects the best available device. Can be set via DOCLING_DEVICE environment variable.",
          "title": "Device"
        },
        "cuda_use_flash_attention2": {
          "default": false,
          "description": "Enable Flash Attention 2 optimization for CUDA devices. Provides significant speedup and memory reduction for transformer models on compatible NVIDIA GPUs (Ampere or newer). Requires flash-attn package installation. Can be set via DOCLING_CUDA_USE_FLASH_ATTENTION2 environment variable.",
          "title": "Cuda Use Flash Attention2",
          "type": "boolean"
        }
      },
      "title": "AcceleratorOptions",
      "type": "object"
    },
    "InferenceFramework": {
      "enum": [
        "mlx",
        "transformers",
        "vllm"
      ],
      "title": "InferenceFramework",
      "type": "string"
    },
    "InlineVlmOptions": {
      "description": "Configuration for inline vision-language models running locally.",
      "properties": {
        "kind": {
          "const": "inline_model_options",
          "default": "inline_model_options",
          "title": "Kind",
          "type": "string"
        },
        "prompt": {
          "description": "Prompt template for the vision-language model. Guides the model's output format and content focus.",
          "title": "Prompt",
          "type": "string"
        },
        "scale": {
          "default": 2.0,
          "description": "Scaling factor for image resolution before processing. Higher values provide more detail but increase processing time and memory usage. Range: 0.5-4.0 typical.",
          "title": "Scale",
          "type": "number"
        },
        "max_size": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Maximum image dimension (width or height) in pixels. Images larger than this are resized while maintaining aspect ratio. If None, no size limit is enforced.",
          "title": "Max Size"
        },
        "temperature": {
          "default": 0.0,
          "description": "Sampling temperature for text generation. 0.0 uses greedy decoding (deterministic), higher values (e.g., 0.7-1.0) increase randomness. Recommended: 0.0 for consistent outputs.",
          "title": "Temperature",
          "type": "number"
        },
        "repo_id": {
          "description": "HuggingFace model repository ID for the vision-language model. Must be a model capable of processing images and generating text.",
          "examples": [
            "Qwen/Qwen2-VL-2B-Instruct",
            "ibm-granite/granite-vision-3.3-2b"
          ],
          "title": "Repo Id",
          "type": "string"
        },
        "revision": {
          "default": "main",
          "description": "Git revision (branch, tag, or commit hash) of the model repository. Allows pinning to specific model versions for reproducibility.",
          "examples": [
            "main",
            "v1.0.0"
          ],
          "title": "Revision",
          "type": "string"
        },
        "trust_remote_code": {
          "default": false,
          "description": "Allow execution of custom code from the model repository. Required for some models with custom architectures. Enable only for trusted sources due to security implications.",
          "title": "Trust Remote Code",
          "type": "boolean"
        },
        "load_in_8bit": {
          "default": true,
          "description": "Load model weights in 8-bit precision using bitsandbytes quantization. Reduces memory usage by ~50% with minimal accuracy loss. Requires bitsandbytes library and CUDA.",
          "title": "Load In 8Bit",
          "type": "boolean"
        },
        "llm_int8_threshold": {
          "default": 6.0,
          "description": "Threshold for LLM.int8() quantization outlier detection. Values with magnitude above this threshold are kept in float16 for accuracy. Lower values increase quantization but may reduce quality.",
          "title": "Llm Int8 Threshold",
          "type": "number"
        },
        "quantized": {
          "default": false,
          "description": "Indicates if the model is pre-quantized (e.g., GGUF, AWQ). When True, skips runtime quantization. Use for models already quantized during training or conversion.",
          "title": "Quantized",
          "type": "boolean"
        },
        "inference_framework": {
          "$ref": "#/$defs/InferenceFramework",
          "description": "Inference framework for running the VLM. Options: `transformers` (HuggingFace), `mlx` (Apple Silicon), `vllm` (high-throughput serving)."
        },
        "transformers_model_type": {
          "$ref": "#/$defs/TransformersModelType",
          "default": "automodel",
          "description": "HuggingFace Transformers model class to use. Options: `automodel` (auto-detect), `automodel-vision2seq` (vision-to-sequence), `automodel-causallm` (causal LM), `automodel-imagetexttotext` (image+text to text)."
        },
        "transformers_prompt_style": {
          "$ref": "#/$defs/TransformersPromptStyle",
          "default": "chat",
          "description": "Prompt formatting style for Transformers models. Options: `chat` (chat template), `raw` (raw text), `none` (no formatting). Use `chat` for instruction-tuned models."
        },
        "response_format": {
          "$ref": "#/$defs/ResponseFormat",
          "description": "Expected output format from the VLM. Options: `doctags` (structured tags), `markdown`, `html`, `otsl` (table structure), `plaintext`. Guides model output parsing."
        },
        "torch_dtype": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "PyTorch data type for model weights. Options: `float32`, `float16`, `bfloat16`. Lower precision reduces memory and increases speed. If None, uses model default.",
          "title": "Torch Dtype"
        },
        "supported_devices": {
          "default": [
            "cpu",
            "cuda",
            "mps",
            "xpu"
          ],
          "description": "List of hardware accelerators supported by this VLM configuration.",
          "items": {
            "$ref": "#/$defs/AcceleratorDevice"
          },
          "title": "Supported Devices",
          "type": "array"
        },
        "stop_strings": {
          "default": [],
          "description": "List of strings that trigger generation stopping when encountered. Used to prevent the model from generating beyond desired output boundaries.",
          "items": {
            "type": "string"
          },
          "title": "Stop Strings",
          "type": "array"
        },
        "custom_stopping_criteria": {
          "default": [],
          "description": "Custom stopping criteria objects for fine-grained control over generation termination. Allows implementing complex stopping logic beyond simple string matching.",
          "items": {
            "anyOf": []
          },
          "title": "Custom Stopping Criteria",
          "type": "array"
        },
        "extra_generation_config": {
          "additionalProperties": true,
          "default": {},
          "description": "Additional generation configuration parameters passed to the model. Overrides or extends default generation settings (e.g., top_p, top_k, repetition_penalty).",
          "title": "Extra Generation Config",
          "type": "object"
        },
        "extra_processor_kwargs": {
          "additionalProperties": true,
          "default": {},
          "description": "Additional keyword arguments passed to the image processor. Used for model-specific preprocessing options not covered by standard parameters.",
          "title": "Extra Processor Kwargs",
          "type": "object"
        },
        "use_kv_cache": {
          "default": true,
          "description": "Enable key-value caching for transformer attention. Significantly speeds up generation by caching attention computations. Disable only for debugging or memory-constrained scenarios.",
          "title": "Use Kv Cache",
          "type": "boolean"
        },
        "max_new_tokens": {
          "default": 4096,
          "description": "Maximum number of tokens to generate. Limits output length to prevent runaway generation. Adjust based on expected output size and memory constraints.",
          "title": "Max New Tokens",
          "type": "integer"
        },
        "track_generated_tokens": {
          "default": false,
          "description": "Track and store generated tokens during inference. Useful for debugging, analysis, or implementing custom post-processing. Increases memory usage.",
          "title": "Track Generated Tokens",
          "type": "boolean"
        },
        "track_input_prompt": {
          "default": false,
          "description": "Track and store the input prompt sent to the model. Useful for debugging, logging, or auditing. May contain sensitive information.",
          "title": "Track Input Prompt",
          "type": "boolean"
        }
      },
      "required": [
        "prompt",
        "repo_id",
        "inference_framework",
        "response_format"
      ],
      "title": "InlineVlmOptions",
      "type": "object"
    },
    "PictureClassificationLabel": {
      "description": "PictureClassificationLabel.",
      "enum": [
        "other",
        "picture_group",
        "pie_chart",
        "bar_chart",
        "stacked_bar_chart",
        "line_chart",
        "flow_chart",
        "scatter_chart",
        "heatmap",
        "remote_sensing",
        "natural_image",
        "chemistry_molecular_structure",
        "chemistry_markush_structure",
        "icon",
        "logo",
        "signature",
        "stamp",
        "qr_code",
        "bar_code",
        "screenshot",
        "map",
        "stratigraphic_chart",
        "cad_drawing",
        "electrical_diagram"
      ],
      "title": "PictureClassificationLabel",
      "type": "string"
    },
    "PictureDescriptionBaseOptions": {
      "description": "Base configuration for picture description models.",
      "properties": {
        "batch_size": {
          "default": 8,
          "description": "Number of images to process in a single batch during picture description. Higher values improve throughput but increase memory usage. Adjust based on available GPU/CPU memory.",
          "title": "Batch Size",
          "type": "integer"
        },
        "scale": {
          "default": 2.0,
          "description": "Scaling factor for image resolution before processing. Higher values (e.g., 2.0) provide more detail for the vision model but increase processing time and memory. Range: 0.5-4.0 typical.",
          "title": "Scale",
          "type": "number"
        },
        "picture_area_threshold": {
          "default": 0.05,
          "description": "Minimum picture area as fraction of page area (0.0-1.0) to trigger description. Pictures smaller than this threshold are skipped. Use lower values (e.g., 0.01) to describe small images.",
          "title": "Picture Area Threshold",
          "type": "number"
        },
        "classification_allow": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to allow for description. Only pictures classified with these labels will be processed. If None, all picture types are allowed unless explicitly denied. Use to focus description on specific image types (e.g., diagrams, charts).",
          "title": "Classification Allow"
        },
        "classification_deny": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/PictureClassificationLabel"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of picture classification labels to exclude from description. Pictures classified with these labels will be skipped. If None, no picture types are denied unless not in allow list. Use to exclude unwanted image types (e.g., decorative images, logos).",
          "title": "Classification Deny"
        },
        "classification_min_confidence": {
          "default": 0.0,
          "description": "Minimum classification confidence score (0.0-1.0) required for a picture to be processed. Pictures with classification confidence below this threshold are skipped. Higher values ensure only confidently classified images are described. Range: 0.0 (no filtering) to 1.0 (maximum confidence).",
          "title": "Classification Min Confidence",
          "type": "number"
        }
      },
      "title": "PictureDescriptionBaseOptions",
      "type": "object"
    },
    "ResponseFormat": {
      "enum": [
        "doctags",
        "markdown",
        "deepseekocr_markdown",
        "html",
        "otsl",
        "plaintext"
      ],
      "title": "ResponseFormat",
      "type": "string"
    },
    "TransformersModelType": {
      "enum": [
        "automodel",
        "automodel-vision2seq",
        "automodel-causallm",
        "automodel-imagetexttotext"
      ],
      "title": "TransformersModelType",
      "type": "string"
    },
    "TransformersPromptStyle": {
      "enum": [
        "chat",
        "raw",
        "none"
      ],
      "title": "TransformersPromptStyle",
      "type": "string"
    }
  },
  "description": "Pipeline configuration for vision-language model based document processing.",
  "properties": {
    "document_timeout": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.",
      "examples": [
        10.0,
        20.0
      ],
      "title": "Document Timeout"
    },
    "accelerator_options": {
      "$ref": "#/$defs/AcceleratorOptions",
      "default": {
        "num_threads": 4,
        "device": "auto",
        "cuda_use_flash_attention2": false
      },
      "description": "Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models."
    },
    "enable_remote_services": {
      "default": false,
      "description": "Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.",
      "examples": [
        false
      ],
      "title": "Enable Remote Services",
      "type": "boolean"
    },
    "allow_external_plugins": {
      "default": false,
      "description": "Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.",
      "examples": [
        false
      ],
      "title": "Allow External Plugins",
      "type": "boolean"
    },
    "artifacts_path": {
      "anyOf": [
        {
          "format": "path",
          "type": "string"
        },
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use `docling-tools models download` to pre-fetch artifacts for offline operation or faster initialization.",
      "examples": [
        "./artifacts",
        "/tmp/docling_outputs"
      ],
      "title": "Artifacts Path"
    },
    "do_picture_classification": {
      "default": false,
      "description": "Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.",
      "title": "Do Picture Classification",
      "type": "boolean"
    },
    "do_picture_description": {
      "default": false,
      "description": "Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.",
      "title": "Do Picture Description",
      "type": "boolean"
    },
    "picture_description_options": {
      "$ref": "#/$defs/PictureDescriptionBaseOptions",
      "default": {
        "batch_size": 8,
        "scale": 2.0,
        "picture_area_threshold": 0.05,
        "classification_allow": null,
        "classification_deny": null,
        "classification_min_confidence": 0.0,
        "repo_id": "HuggingFaceTB/SmolVLM-256M-Instruct",
        "prompt": "Describe this image in a few sentences.",
        "generation_config": {
          "do_sample": false,
          "max_new_tokens": 200
        }
      },
      "description": "Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when `do_picture_description=True`."
    },
    "images_scale": {
      "default": 1.0,
      "description": "Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements. Recommended values: 1.0 (standard quality), 2.0 (high resolution), 0.5 (lower resolution for previews).",
      "title": "Images Scale",
      "type": "number"
    },
    "generate_page_images": {
      "default": true,
      "description": "Generate page images for VLM processing. Required for vision-language models to analyze document pages. Automatically enabled in VLM pipeline.",
      "title": "Generate Page Images",
      "type": "boolean"
    },
    "generate_picture_images": {
      "default": false,
      "description": "Extract and save embedded images from the document. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.",
      "title": "Generate Picture Images",
      "type": "boolean"
    },
    "force_backend_text": {
      "default": false,
      "description": "Force use of backend's native text extraction instead of VLM predictions. When enabled, bypasses VLM text detection and uses embedded text from the document directly.",
      "title": "Force Backend Text",
      "type": "boolean"
    },
    "vlm_options": {
      "$ref": "#/$defs/InlineVlmOptions",
      "default": {
        "kind": "inline_model_options",
        "prompt": "Convert this page to docling.",
        "scale": 2.0,
        "max_size": null,
        "temperature": 0.0,
        "repo_id": "ibm-granite/granite-docling-258M",
        "revision": "main",
        "trust_remote_code": false,
        "load_in_8bit": true,
        "llm_int8_threshold": 6.0,
        "quantized": false,
        "inference_framework": "transformers",
        "transformers_model_type": "automodel-imagetexttotext",
        "transformers_prompt_style": "chat",
        "response_format": "doctags",
        "torch_dtype": null,
        "supported_devices": [
          "cpu",
          "cuda",
          "xpu"
        ],
        "stop_strings": [
          "</doctag>",
          "<|end_of_text|>"
        ],
        "custom_stopping_criteria": [],
        "extra_generation_config": {
          "skip_special_tokens": false
        },
        "extra_processor_kwargs": {},
        "use_kv_cache": true,
        "max_new_tokens": 8192,
        "track_generated_tokens": false,
        "track_input_prompt": false
      },
      "description": "Vision-Language Model configuration for document understanding. Specifies which VLM to use (inline or API) and model-specific parameters for vision-based document processing.",
      "title": "Vlm Options"
    }
  },
  "title": "VlmPipelineOptions",
  "type": "object"
}

Fields:

accelerator_options pydantic-field

accelerator_options: AcceleratorOptions

Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.

allow_external_plugins pydantic-field

allow_external_plugins: bool

Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.

artifacts_path pydantic-field

artifacts_path: Optional[Union[Path, str]]

Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use. Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

do_picture_classification pydantic-field

do_picture_classification: bool

Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.

do_picture_description pydantic-field

do_picture_description: bool

Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.

document_timeout pydantic-field

document_timeout: Optional[float]

Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced. Recommended: 90-120 seconds for production systems.

enable_remote_services pydantic-field

enable_remote_services: bool

Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.

force_backend_text pydantic-field

force_backend_text: bool

Force use of backend's native text extraction instead of VLM predictions. When enabled, bypasses VLM text detection and uses embedded text from the document directly.

generate_page_images pydantic-field

generate_page_images: bool

Generate page images for VLM processing. Required for vision-language models to analyze document pages. Automatically enabled in VLM pipeline.

generate_picture_images pydantic-field

generate_picture_images: bool

Extract and save embedded images from the document. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.

images_scale pydantic-field

images_scale: float

Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements. Recommended values: 1.0 (standard quality), 2.0 (high resolution), 0.5 (lower resolution for previews).

kind class-attribute

kind: str

picture_description_options pydantic-field

picture_description_options: PictureDescriptionBaseOptions

Configuration for picture description model. Specifies which vision model to use (API or inline) and model-specific parameters. Only applicable when do_picture_description=True.

vlm_options pydantic-field

vlm_options: Union[InlineVlmOptions, ApiVlmOptions]

Vision-Language Model configuration for document understanding. Specifies which VLM to use (inline or API) and model-specific parameters for vision-based document processing.