CLI reference

This page documents Docling's command line tools. It is generated by scripts/render_cli_reference.py from the live Typer apps — do not edit by hand.

`docling`

Usage

docling [OPTIONS] source

Arguments

Name	Type	Required	Description
`source`	`text`	yes	PDF files to convert. Can be local file / directory paths or URL.

Options

Name	Type	Default	Description
`--from`	`docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`, `csv`, `xlsx`, `xml_uspto`, `xml_jats`, `xml_xbrl`, `mets_gbs`, `json_docling`, `audio`, `vtt`, `latex` (repeatable)		Input formats to accept. Defaults to all supported formats.
`--to`	`md`, `json`, `yaml`, `html`, `html_split_page`, `text`, `doctags`, `vtt` (repeatable)		Specify output formats. Defaults to Markdown.
`--show-layout` / `--no-show-layout`	flag	`false`	If enabled, the page images will show the bounding-boxes of the items.
`--headers`	`text`		Specify http request headers used when fetching url input sources in the form of a JSON string
`--image-export-mode`	`placeholder`, `embedded`, `referenced`	`embedded`	Image export mode for image-capable document outputs (JSON, YAML, HTML, HTML split-page, and Markdown). Text, DocTags, and WebVTT outputs do not export images. With `placeholder`, only the position of the image is marked in the output. In `embedded` mode, the image is embedded as base64 encoded string. In `referenced` mode, the image is exported in PNG format and referenced from the main exported document.
`--pipeline`	`legacy`, `standard`, `vlm`, `asr`	`standard`	Choose the pipeline to process PDF or image files.
`--vlm-model`	`text`	`granite_docling`	Choose the VLM preset to use with PDF or image files. Available presets: smoldocling, granite_docling, deepseek_ocr, granite_vision, pixtral, got_ocr, phi4, qwen, nanonets_ocr2, gemma_12b, gemma_27b, dolphin, glm_ocr, lightonocr, falcon_ocr
`--asr-model`	`whisper_tiny`, `whisper_small`, `whisper_medium`, `whisper_base`, `whisper_large`, `whisper_turbo`, `whisper_tiny_mlx`, `whisper_small_mlx`, `whisper_medium_mlx`, `whisper_base_mlx`, `whisper_large_mlx`, `whisper_turbo_mlx`, `whisper_tiny_native`, `whisper_small_native`, `whisper_medium_native`, `whisper_base_native`, `whisper_large_native`, `whisper_turbo_native`	`whisper_tiny`	Choose the ASR model to use with audio/video files.
`--ocr` / `--no-ocr`	flag	`true`	If enabled, the bitmap content will be processed using OCR.
`--force-ocr` / `--no-force-ocr`	flag	`false`	Replace any existing text with OCR generated text over the full content.
`--tables` / `--no-tables`	flag	`true`	If enabled, the table structure model will be used to extract table information.
`--ocr-engine`	`text`	`auto`	The OCR engine to use. When --allow-external-plugins is not set, the available values are: auto, easyocr, kserve_v2_ocr, ocrmac, rapidocr, tesserocr, tesseract. Use the option --show-external-plugins to see the options allowed with external plugins.
`--ocr-lang`	`text`		Provide a comma-separated list of languages used by the OCR engine. Note that each OCR engine has different values for the language names.
`--psm`	`integer`		Page Segmentation Mode for the OCR engine (0-13).
`--pdf-backend`	`pypdfium2`, `docling_parse`, `threaded_docling_parse`, `dlparse_v1`, `dlparse_v2`, `dlparse_v4`	`docling_parse`	The PDF backend to use.
`--pdf-password`	`text`		Password for protected PDF documents
`--table-mode`	`fast`, `accurate`	`accurate`	The mode to use in the table structure model.
`--enrich-code` / `--no-enrich-code`	flag	`false`	Enable the code enrichment model in the pipeline.
`--enrich-formula` / `--no-enrich-formula`	flag	`false`	Enable the formula enrichment model in the pipeline.
`--enrich-picture-classes` / `--no-enrich-picture-classes`	flag	`false`	Enable the picture classification enrichment model in the pipeline.
`--enrich-picture-description` / `--no-enrich-picture-description`	flag	`false`	Enable the picture description model in the pipeline.
`--enrich-chart-extraction` / `--no-enrich-chart-extraction`	flag	`false`	Enable chart data extraction from bar, pie, and line charts.
`--artifacts-path`	`path`		If provided, the location of the model artifacts.
`--enable-remote-services` / `--no-enable-remote-services`	flag	`false`	Must be enabled when using models connecting to remote services.
`--allow-external-plugins` / `--no-allow-external-plugins`	flag	`false`	Must be enabled for loading modules from third-party plugins.
`--show-external-plugins` / `--no-show-external-plugins`	flag	`false`	List the third-party plugins which are available when the option --allow-external-plugins is set.
`--abort-on-error` / `--no-abort-on-error`	flag	`false`	If enabled, the processing will be aborted when the first error is encountered.
`--output`	`path`	`.`	Output directory where results are saved.
`--verbose` / `-v`	`integer`	`0`	Set the verbosity level. -v for info logging, -vv for debug logging.
`--debug-visualize-cells` / `--no-debug-visualize-cells`	flag	`false`	Enable debug output which visualizes the PDF cells
`--debug-visualize-ocr` / `--no-debug-visualize-ocr`	flag	`false`	Enable debug output which visualizes the OCR cells
`--debug-visualize-layout` / `--no-debug-visualize-layout`	flag	`false`	Enable debug output which visualizes the layout clusters
`--debug-visualize-tables` / `--no-debug-visualize-tables`	flag	`false`	Enable debug output which visualizes the table cells
`--version`	flag		Show version information.
`--document-timeout`	`float`		The timeout for processing each document, in seconds.
`--num-threads`	`integer`	`4`	Number of threads
`--release-native-memory-every-n-pages`	`integer`	`128`	Release native parser memory after every N decoded pages when using the threaded docling-parse backend.
`--device`	`auto`, `cpu`, `cuda`, `mps`, `xpu`	`auto`	Accelerator device
`--logo`	flag		Docling logo
`--page-batch-size`	`integer`	`4`	Number of pages processed in one batch. Default: 4
`--profiling` / `--no-profiling`	flag	`false`	If enabled, it summarizes profiling details for all conversion stages.
`--save-profiling` / `--no-save-profiling`	flag	`false`	If enabled, it saves the profiling summaries to json.

`docling-tools`

Usage

docling-tools [OPTIONS] COMMAND [ARGS]...

Subcommands

Command	Description
`docling-tools models`

`docling-tools models`

Usage

docling-tools models [OPTIONS] COMMAND [ARGS]...

Subcommands

Command	Description
`docling-tools models download`
`docling-tools models download-hf-repo`

`docling-tools models download`

Usage

docling-tools models download [OPTIONS] [MODELS]:[layout|tableformer|tableformerv2|code_formula|picture_classifier|smolvlm|granitedocling|granitedocling_mlx|smoldocling|smoldocling_mlx|granite_vision|granite_chart_extraction|granite_chart_extraction_v4|rapidocr|easyocr]...

Arguments

Name	Type	Required	Description
`MODELS`	`layout`, `tableformer`, `tableformerv2`, `code_formula`, `picture_classifier`, `smolvlm`, `granitedocling`, `granitedocling_mlx`, `smoldocling`, `smoldocling_mlx`, `granite_vision`, `granite_chart_extraction`, `granite_chart_extraction_v4`, `rapidocr`, `easyocr`	no	Models to download (default behavior: a predefined set of models will be downloaded).

Options

Name	Type	Default	Description
`-o` / `--output-dir`	`path`	`/home/runner/.cache/docling/models`	The directory where to download the models.
`--force` / `--no-force`	flag	`false`	If true, the download will be forced.
`--all`	flag	`false`	If true, all available models will be downloaded (mutually exclusive with passing specific models).
`-q` / `--quiet`	flag	`false`	No extra output is generated, the CLI prints only the directory with the cached models.

`docling-tools models download-hf-repo`

Usage

docling-tools models download-hf-repo [OPTIONS] MODELS...

Arguments

Name	Type	Required	Description
`MODELS`	`text`	yes	Specific models to download from HuggingFace identified by their repo id. For example: docling-project/docling-models .

Options

Name	Type	Default	Description
`-o` / `--output-dir`	`path`	`/home/runner/.cache/docling/models`	The directory where to download the models.
`--force` / `--no-force`	flag	`false`	If true, the download will be forced.
`-q` / `--quiet`	flag	`false`	No extra output is generated, the CLI prints only the directory with the cached models.