Plugins
Docling allows to be extended with third-party plugins which extend the choice of options provided in several steps of the pipeline.
Plugins are loaded via the pluggy system which allows third-party developers to register the new capabilities using the setuptools entrypoint.
The actual entrypoint definition might vary, depending on the packaging system you are using. Here are a few examples:
[project.entry-points."docling"]
your_plugin_name = "your_package.module"
[tool.poetry.plugins."docling"]
your_plugin_name = "your_package.module"
[options.entry_points]
docling =
your_plugin_name = your_package.module
from setuptools import setup
setup(
# ...,
entry_points = {
'docling': [
'your_plugin_name = "your_package.module"'
]
}
)
your_plugin_name
is the name you choose for your plugin. This must be unique among the broader Docling ecosystem.your_package.module
is the reference to the module in your package which is responsible for the plugin registration.
Plugin factories
OCR factory
The OCR factory allows to provide more OCR engines to the Docling users.
The content of your_package.module
registers the OCR engines with a code similar to:
# Factory registration
def ocr_engines():
return {
"ocr_engines": [
YourOcrModel,
]
}
where YourOcrModel
must implement the BaseOcrModel
and provide an options class derived from OcrOptions
.
If you look for an example, the default Docling plugins is a good starting point.
Third-party plugins
When the plugin is not provided by the main docling
package but by a third-party package this have to be enabled explicitly via the allow_external_plugins
option.
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
pipeline_options = PdfPipelineOptions()
pipeline_options.allow_external_plugins = True # <-- enabled the external plugins
pipeline_options.ocr_options = YourOptions # <-- your options here
doc_converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_options=pipeline_options
)
}
)
Using the docling
CLI
Similarly, when using the docling
users have to enable external plugins before selecting the new one.
# Show the external plugins
docling --show-external-plugins
# Run docling with the new plugin
docling --allow-external-plugins --ocr-engine=NAME