Workflows overview

Workflows orchestrate the document intelligence pipeline: ingestion, extraction, enrichment, chunking, and delivery. They can run on a schedule, on demand, or in response to new data. Use workflows to keep the Vector Catalog up to date and to automate document preprocessing for agents and downstream systems.

What workflows do

  • Trigger — Run on a schedule (e.g. hourly, nightly), on demand, or when new files arrive in a connected source.
  • Execute steps — Chain extraction, validation, optional embedding, and delivery to the Vector Catalog or other destinations.
  • Retry and validate — Retry failed steps and use data validation checkpoints where supported.
  • Monitor — Track run history, success/failure, and quality metrics.

When to use workflows

  • You need continuous data delivery — Keep AI outputs and search indexes in sync with the latest business content.
  • You want automation — No manual triggers; pipelines run when data changes or on a schedule.
  • You are scaling — Run many extractions and keep the catalog fresh without scaling headcount.

How it fits with the rest of Bundata

  • Extraction — Workflows invoke extraction runs; configure schema and options per workflow.
  • Vector Catalog — Many workflows send output to the catalog for search and agents.
  • Integrations — Source and destination connectors are configured in the workflow.

DAG concepts

Workflows are modeled as a directed acyclic graph (DAG) of steps: each step runs after its dependencies. Branching and conditional logic let you route documents by type or validation result.

Common mistakes

  • No schedule or trigger — Workflows that are only manual may leave the catalog stale.
  • Ignoring failures — Use monitoring and alerts to catch failed runs and fix connectors or schemas.
  • Overloading one workflow — Split by source or use case so failures and retries are isolated.

Next steps