Workflows overview
Workflows orchestrate the document intelligence pipeline: ingestion, extraction, enrichment, chunking, and delivery. They can run on a schedule, on demand, or in response to new data. Use workflows to keep the Vector Catalog up to date and to automate document preprocessing for agents and downstream systems.
What workflows do
- Trigger — Run on a schedule (e.g. hourly, nightly), on demand, or when new files arrive in a connected source.
- Execute steps — Chain extraction, validation, optional embedding, and delivery to the Vector Catalog or other destinations.
- Retry and validate — Retry failed steps and use data validation checkpoints where supported.
- Monitor — Track run history, success/failure, and quality metrics.
When to use workflows
- You need continuous data delivery — Keep AI outputs and search indexes in sync with the latest business content.
- You want automation — No manual triggers; pipelines run when data changes or on a schedule.
- You are scaling — Run many extractions and keep the catalog fresh without scaling headcount.
How it fits with the rest of Bundata
- Extraction — Workflows invoke extraction runs; configure schema and options per workflow.
- Vector Catalog — Many workflows send output to the catalog for search and agents.
- Integrations — Source and destination connectors are configured in the workflow.
DAG concepts
Workflows are modeled as a directed acyclic graph (DAG) of steps: each step runs after its dependencies. Branching and conditional logic let you route documents by type or validation result.
Common mistakes
- No schedule or trigger — Workflows that are only manual may leave the catalog stale.
- Ignoring failures — Use monitoring and alerts to catch failed runs and fix connectors or schemas.
- Overloading one workflow — Split by source or use case so failures and retries are isolated.
Next steps
- Triggers & scheduling — Configure when workflows run.
- Monitoring — Run history and alerting.
- Extraction overview — What runs inside a workflow step.