Run your first workflow

This tutorial walks you through creating and running your first workflow in Bundata. Workflows orchestrate ingestion, schema-aware extraction, validation, vectorization, and delivery in one operational layer so you can turn document processing into repeatable production systems. From legal documents and contracts to receipts and operational files, Bundata Workflows help you move from raw content to trusted downstream automation.

What workflows are in Bundata

Bundata Workflows are the orchestration layer for document AI. They help you:

  • Source-aware ingestion — Trigger workflows from storage, uploads, integrations, or external systems.
  • Extraction and validation — Run schema-aware extraction and verify outputs before publication.
  • Vector publishing — Send approved smart bites into the Vector Catalog and vector-ready collections.
  • Operational delivery — Push outputs to agents, search, APIs, webhooks, and downstream workflows.

Workflows can run on a schedule (e.g. hourly syncs, nightly batch jobs) or on triggers (e.g. new file in a connected source). Workflow orchestration keeps the catalog and downstream systems up to date with the latest business content. Product overview: Product → Workflows. Platform: Platform → Workflows.

Prerequisites

  • A Bundata account and access to the platform. See Sign up and Quickstart.
  • A schema to use for extraction. If you don’t have one, complete First schema.
  • (Optional) A connected source (e.g. S3, SharePoint) if you want to trigger on new documents; otherwise you can trigger manually or on a schedule. See Integrations → Sources.

Step 1: Open Workflows

  1. Sign in to the Bundata Platform.
  2. Go to Workflows (e.g. /platform/workflows).
  3. Confirm you are in the correct workspace.

Here you create and manage workflow definitions (triggers and steps) and view run history.

Step 2: Create a workflow

  1. Click Create workflow (or equivalent).
  2. Give the workflow a name that reflects its purpose, e.g. Nightly contract ingestion or Invoice extraction to catalog.
  3. Add a trigger: either a schedule (e.g. daily at 2 AM) or a source event (e.g. new file in a connected bucket or folder). See Triggers & scheduling.
  4. Add at least one step that runs extraction. Configure the step with:
    • Schema — Select the schema you want extraction to use so output is schema-aware.
    • Source — Where to read documents from (e.g. a connected source, or files passed from the trigger).
    • Destination — Where to send the output, e.g. a Vector Catalog collection, an API, or a webhook. See Integrations → Destinations.

Scheduled jobs keep pipelines up-to-date without manual triggers; smart-routing and auto-optimization (where available) help send each document through the right processing steps.

Step 3: Configure the extraction step

  1. In the extraction step, select the schema (and optionally version) so all runs produce smart bites that conform to that structure.
  2. Set the source of documents (e.g. a specific folder, connector, or the trigger payload).
  3. Set the destination: e.g. a Vector Catalog collection so results are searchable and available for agents and Vector Search. You can also send to webhooks, APIs, or other systems.
  4. Save the step and the workflow.

This defines the document intelligence layer for this pipeline: unstructured inputs → context-aware extractionvector-ready intelligence → catalog and downstream systems.

Step 4: Save and run

  1. Save the workflow.
  2. Trigger a run manually (e.g. “Run now”) to test, or wait for the schedule or source event. See Workflows overview.
  3. Monitor the run in the workflow run history. Check that the step completes and that output appears in the destination (e.g. new or updated records in the Vector Catalog). See Monitoring.

Step 5: Monitor and fix failures

  1. Use the run history to see status (queued, running, completed, failed) and step-level details.
  2. If a run fails, inspect logs or error messages: common issues include invalid schema, missing source credentials, or destination errors. Fix the configuration and re-run. See Monitoring and Error handling.

What happens behind the scenes

When the workflow runs, Bundata executes each step in order: ingestion (if configured), extraction using the chosen schema, optional validation, then vector publishing or delivery to the destination. Outputs are smart bites with source lineage and extraction confidence so downstream systems (Vector Search, agents, APIs) get consistent, traceable data. Workflows keep AI outputs up to date with the latest business content and support scaling GenAI workloads without scaling headcount.

Common mistakes

  • No trigger — Ensure the workflow has a trigger (schedule or source event) or that you run it manually for testing.
  • Wrong schema or destination — Double-check schema ID and destination (e.g. collection ID) so extraction output lands where you expect.
  • Ignoring run history — Use monitoring to catch failures and fix configuration or source issues.

Next steps