Skip to content

CLI reference

This page provides documentation for our command line tools.



docling [OPTIONS] source


Name Type Description Default
--from choice (docx | pptx | html | image | pdf | asciidoc | md | csv | xlsx | xml_uspto | xml_jats | json_docling) Specify input formats to convert from. Defaults to all formats. None
--to choice (md | json | html | text | doctags) Specify output formats. Defaults to Markdown. None
--headers text Specify http request headers used when fetching url input sources in the form of a JSON string None
--image-export-mode choice (placeholder | embedded | referenced) Image export mode for the document (only in case of JSON, Markdown or HTML). With placeholder, only the position of the image is marked in the output. In embedded mode, the image is embedded as base64 encoded string. In referenced mode, the image is exported in PNG format and referenced from the main exported document. ImageRefMode.EMBEDDED
--ocr / --no-ocr boolean If enabled, the bitmap content will be processed using OCR. True
--force-ocr / --no-force-ocr boolean Replace any existing text with OCR generated text over the full content. False
--ocr-engine choice (easyocr | tesseract_cli | tesseract | ocrmac | rapidocr) The OCR engine to use. OcrEngine.EASYOCR
--ocr-lang text Provide a comma-separated list of languages used by the OCR engine. Note that each OCR engine has different values for the language names. None
--pdf-backend choice (pypdfium2 | dlparse_v1 | dlparse_v2) The PDF backend to use. PdfBackend.DLPARSE_V2
--table-mode choice (fast | accurate) The mode to use in the table structure model. TableFormerMode.ACCURATE
--enrich-code / --no-enrich-code boolean Enable the code enrichment model in the pipeline. False
--enrich-formula / --no-enrich-formula boolean Enable the formula enrichment model in the pipeline. False
--enrich-picture-classes / --no-enrich-picture-classes boolean Enable the picture classification enrichment model in the pipeline. False
--enrich-picture-description / --no-enrich-picture-description boolean Enable the picture description model in the pipeline. False
--artifacts-path path If provided, the location of the model artifacts. None
--enable-remote-services / --no-enable-remote-services boolean Must be enabled when using models connecting to remote services. False
--abort-on-error / --no-abort-on-error boolean If enabled, the processing will be aborted when the first error is encountered. False
--output path Output directory where results are saved. .
--verbose, -v integer Set the verbosity level. -v for info logging, -vv for debug logging. 0
--debug-visualize-cells / --no-debug-visualize-cells boolean Enable debug output which visualizes the PDF cells False
--debug-visualize-ocr / --no-debug-visualize-ocr boolean Enable debug output which visualizes the OCR cells False
--debug-visualize-layout / --no-debug-visualize-layout boolean Enable debug output which visualizes the layour clusters False
--debug-visualize-tables / --no-debug-visualize-tables boolean Enable debug output which visualizes the table cells False
--version boolean Show version information. None
--document-timeout float The timeout for processing each document, in seconds. None
--num-threads integer Number of threads 4
--device choice (auto | cpu | cuda | mps) Accelerator device AcceleratorDevice.AUTO
--help boolean Show this message and exit. False