Extraction

Bundata extraction turns unstructured documents—contracts, invoices, policy docs, legal filings—into structured, schema-aware output. Each run produces smart bites: chunks of content plus metadata and optional enrichment (named entities, image and table descriptions) with source lineage and extraction confidence for quality and audit.

Overview

Extraction is the core of the document intelligence layer. You provide documents (or connect a source), choose or create a schema, and run extraction. Output is vector-ready: smart bites can be indexed into the Vector Catalog for semantic search and RAG, or sent to agents and workflows.

Key tasks

TaskGuide
Understand how extraction worksOverview
See supported formatsSupported file types
Run and monitor jobsExtraction runs
Improve quality and avoid pitfallsBest practices

Tutorials and concepts