Architecture center

Patterns and best practices for designing document intelligence pipelines with Bundata.

Core architecture

Bundata acts as a document intelligence layer between your raw documents and your applications. Typical flow:

Ingest — Documents enter via source connectors (S3, SharePoint, Confluence, etc.) or direct upload.
Extract — Context-aware extraction produces schema-aware smart bites with metadata and optional enrichment (NER, image/table descriptions).
Chunk and embed — Content is chunked and optionally embedded for retrieval.
Deliver — Output goes to the Vector Catalog, your vector store, or downstream systems. Source lineage is preserved for grounded answers and audit.

Schema-first design — Define schemas in Schema Studio before scaling extraction. See Schema design and Extraction best practices.
Workflow orchestration — Use Workflows for scheduled runs, triggers, and retries so the Vector Catalog stays fresh. See Workflows overview and Triggers & scheduling.
Grounding for agents — Query the Vector Catalog for semantic retrieval and pass chunks to LLMs with source lineage. See Grounding from catalog and Vector Search overview.