Schema Definition¶
Pydantic templates are the schema contract for all extraction modes (direct, staged, delta).
Core rules¶
- Use explicit entities (
graph_id_fields) and components (is_entity=False). - Keep identity fields short, stable, and required when possible; prefer descriptive IDs over raw section/figure labels.
- Prefer 2-4 nesting levels; flatten deeply recursive structures.
- Use
edge(label=...)consistently for relationship-bearing fields. - Write extraction-oriented descriptions and realistic examples for every important field.
- Use validators to correct semantic errors (e.g. wrong unit in amount) and to deduplicate root-level lists when using chunked extraction.
Extraction-focused design¶
- Direct: optimize semantic clarity and validation tolerance.
- Staged: optimize ID discovery and parent linkage determinism.
- Delta: optimize path fidelity, flat properties, canonicalized values, and merge-safe identities.
Recommended reading order¶
template-basics.mdentities-vs-components.mdfield-definitions.mdrelationships.mdbest-practices.mdstaged-extraction-schema.mdvalidation.mdadvanced-patterns.md