Architecture center

Patterns and best practices for designing document intelligence pipelines with Bundata.

Core architecture

Bundata acts as a document intelligence layer between your raw documents and your applications. Typical flow:

  1. Ingest — Documents enter via source connectors (S3, SharePoint, Confluence, etc.) or direct upload.
  2. Extract — Context-aware extraction produces schema-aware smart bites with metadata and optional enrichment (NER, image/table descriptions).
  3. Chunk and embed — Content is chunked and optionally embedded for retrieval.
  4. Deliver — Output goes to the Vector Catalog, your vector store, or downstream systems. Source lineage is preserved for grounded answers and audit.

Key patterns

Use-case guides

Reference

  • Glossary — Smart bites, Vector Catalog, extraction confidence, source lineage
  • Limits and quotas — Rate limits and capacity