Extraction

Bundata extraction turns unstructured documents—contracts, invoices, policy docs, legal filings—into structured, schema-aware output. Each run produces smart bites: chunks of content plus metadata and optional enrichment (named entities, image and table descriptions) with source lineage and extraction confidence for quality and audit.

Overview

Extraction is the core of the document intelligence layer. You provide documents (or connect a source), choose or create a schema, and run extraction. Output is vector-ready: smart bites can be indexed into the Vector Catalog for semantic search and RAG, or sent to agents and workflows.

Key tasks

Task	Guide
Understand how extraction works	Overview
See supported formats	Supported file types
Run and monitor jobs	Extraction runs
Improve quality and avoid pitfalls	Best practices

Tutorials and concepts

Quickstart — From zero to first extraction.
Create your first extraction — Step-by-step first run.
Schema Studio overview — Design the schema extraction uses.

Vector Catalog — Index extraction output for search.
Vector Search — Query with semantic search.
Workflows — Orchestrate extraction and downstream steps.

Extraction

Overview

Key tasks

Tutorials and concepts

Related product areas

Quick links