Architecture center
Patterns and best practices for designing document intelligence pipelines with Bundata.
Core architecture
Bundata acts as a document intelligence layer between your raw documents and your applications. Typical flow:
- Ingest — Documents enter via source connectors (S3, SharePoint, Confluence, etc.) or direct upload.
- Extract — Context-aware extraction produces schema-aware smart bites with metadata and optional enrichment (NER, image/table descriptions).
- Chunk and embed — Content is chunked and optionally embedded for retrieval.
- Deliver — Output goes to the Vector Catalog, your vector store, or downstream systems. Source lineage is preserved for grounded answers and audit.
Key patterns
- Schema-first design — Define schemas in Schema Studio before scaling extraction. See Schema design and Extraction best practices.
- Workflow orchestration — Use Workflows for scheduled runs, triggers, and retries so the Vector Catalog stays fresh. See Workflows overview and Triggers & scheduling.
- Grounding for agents — Query the Vector Catalog for semantic retrieval and pass chunks to LLMs with source lineage. See Grounding from catalog and Vector Search overview.
Use-case guides
- First extraction — End-to-end extraction run
- First agent — Agent grounded on the catalog
- Workflows — Automation and scheduling
- Integrations — Connectors and cloud
Reference
- Glossary — Smart bites, Vector Catalog, extraction confidence, source lineage
- Limits and quotas — Rate limits and capacity