Extraction
Bundata extraction turns unstructured documents—contracts, invoices, policy docs, legal filings—into structured, schema-aware output. Each run produces smart bites: chunks of content plus metadata and optional enrichment (named entities, image and table descriptions) with source lineage and extraction confidence for quality and audit.
Overview
Extraction is the core of the document intelligence layer. You provide documents (or connect a source), choose or create a schema, and run extraction. Output is vector-ready: smart bites can be indexed into the Vector Catalog for semantic search and RAG, or sent to agents and workflows.
Key tasks
| Task | Guide |
|---|---|
| Understand how extraction works | Overview |
| See supported formats | Supported file types |
| Run and monitor jobs | Extraction runs |
| Improve quality and avoid pitfalls | Best practices |
Tutorials and concepts
- Quickstart — From zero to first extraction.
- Create your first extraction — Step-by-step first run.
- Schema Studio overview — Design the schema extraction uses.
Related product areas
- Vector Catalog — Index extraction output for search.
- Vector Search — Query with semantic search.
- Workflows — Orchestrate extraction and downstream steps.