REST API overview

The Bundata REST API gives you programmatic access to the full document intelligence layer: run context-aware extraction, query the Vector Catalog, manage schemas and workflows, and trigger runs. This page covers authentication, main endpoint groups, request/response patterns, and how to use the API in realistic Bundata workflows (contracts, invoices, policy docs).

Authentication

Every request must include a valid API key in the Authorization header:

Authorization: Bearer <your-api-key>

Create and manage keys in the Bundata dashboard. Use separate keys for development, staging, and production; never commit keys to source control. See Authentication. Invalid or missing keys return 401 Unauthorized.

Base URL and versioning

  • Base URL — Provided in your account or deployment (e.g. https://api.bundata.com or your in-VPC host). Use the URL for your plan and region.
  • Versioning — The API may be versioned in the path (e.g. /v1/) or via headers. Prefer the version documented for your plan so behavior stays stable.

Main endpoint groups

Extraction

  • Run extraction — POST documents (or references, e.g. S3 URI) and a schema ID; get a job or run ID. Poll for completion or use webhooks. Response includes smart bites and source lineage.
  • Batch extraction — Submit multiple documents in one request; receive per-document status and results. Use for bulk processing of contracts or invoices. See Extraction runs.
  • Run status — GET run by ID to check status (queued, running, completed, failed) and fetch results when complete.

Use extraction endpoints to build pipelines that turn unstructured documents into schema-aware output for the Vector Catalog or your own store.

  • Search — POST a query (natural language or embedding) and collection ID; get ranked smart bites with metadata and source lineage. Optional metadata filters (source, date, document type). See Vector Search overview.
  • Collections — List collections, create/update collection config, and manage indexing options. See Vector Catalog: Collections.

Use search in RAG and agent flows to retrieve context for grounded answers.

Schemas

  • List / get schemas — List schemas in the workspace; get schema by ID and version. Use when starting extraction to resolve schema ID.
  • Create / update schema — Define or update schema (fields, types, required/optional). Prefer Schema Studio for visual design; use API for automation and versioning. See Schema field reference.

Workflows

  • Trigger run — POST to start a workflow run (with optional trigger params, e.g. source path). Get run ID; poll or use webhooks for completion. See Workflows overview and Triggers & scheduling.
  • Run status — GET workflow run by ID; inspect step-level status and errors. See Monitoring.

Connectors

  • Source / destination config — Create or update connector configuration (credentials, scope, target). Used when building ingestion and delivery pipelines. See Source connectors and Destination connectors.

Exact paths and request/response shapes are in the API reference. The groups above give you the map for extraction → catalog → search and workflow orchestration.

Request and response patterns

  • Content type — Send JSON with Content-Type: application/json. Responses are JSON unless otherwise documented (e.g. file download).
  • Ids — Resources (runs, collections, schemas) are identified by ID. Include them in paths or body as required. Use idempotency keys for mutation endpoints when supported to avoid duplicate work on retries. See Error handling.
  • Pagination — List endpoints may return paginated results. Use limit and offset (or cursor) as documented. See Reference: API and Rate limits.

Error behavior

  • 4xx — Client error (bad request, unauthorized, not found). Fix the request; do not retry unchanged. See Error codes.
  • 429 — Rate limited. Retry after Retry-After with backoff. See Rate limits.
  • 5xx — Server error. Retry with backoff. See Error handling.

Error responses include a code and message; use the code for programmatic handling.

  1. Create or get schema — Ensure a schema exists (Schema Studio or API). Note schema ID.
  2. Run extraction — POST document(s) and schema ID. Get run ID; wait for completion (poll or webhook).
  3. Ingest to catalog — If not automatic, POST results to the Vector Catalog collection (or run a workflow that does extraction + ingest).
  4. Search — POST query and collection ID to search endpoint. Use results as context for an LLM for grounded answers with source lineage.

Next steps