Workflows overview

Workflows orchestrate the document intelligence pipeline: ingestion, extraction, enrichment, chunking, and delivery. They can run on a schedule, on demand, or in response to new data. Use workflows to keep the Vector Catalog up to date and to automate document preprocessing for agents and downstream systems.

What workflows do

Trigger — Run on a schedule (e.g. hourly, nightly), on demand, or when new files arrive in a connected source.
Execute steps — Chain extraction, validation, optional embedding, and delivery to the Vector Catalog or other destinations.
Retry and validate — Retry failed steps and use data validation checkpoints where supported.
Monitor — Track run history, success/failure, and quality metrics.

When to use workflows

You need continuous data delivery — Keep AI outputs and search indexes in sync with the latest business content.
You want automation — No manual triggers; pipelines run when data changes or on a schedule.
You are scaling — Run many extractions and keep the catalog fresh without scaling headcount.

How it fits with the rest of Bundata

Extraction — Workflows invoke extraction runs; configure schema and options per workflow.
Vector Catalog — Many workflows send output to the catalog for search and agents.
Integrations — Source and destination connectors are configured in the workflow.

DAG concepts

Workflows are modeled as a directed acyclic graph (DAG) of steps: each step runs after its dependencies. Branching and conditional logic let you route documents by type or validation result.

Common mistakes

No schedule or trigger — Workflows that are only manual may leave the catalog stale.
Ignoring failures — Use monitoring and alerts to catch failed runs and fix connectors or schemas.
Overloading one workflow — Split by source or use case so failures and retries are isolated.

Next steps

Triggers & scheduling — Configure when workflows run.
Monitoring — Run history and alerting.
Extraction overview — What runs inside a workflow step.