API reference

This page is the reference for the Bundata REST API: endpoint groups, authentication, and how to use the API for extraction, Vector Catalog search, schemas, workflows, and connectors. For a conceptual overview and examples, see REST API overview.

Authentication

All requests require an API key in the header:

Authorization: Bearer <api_key>

Create and manage keys in the Bundata dashboard. See Authentication. Returns 401 if missing or invalid.

Base URL

Use the base URL for your plan and region (e.g. https://api.bundata.com or your in-VPC host). Paths below are relative to that base. Version prefix (e.g. /v1/) may apply; see your product docs.

Endpoint groups

Extraction

Method	Path (conceptual)	Description
POST	`/extraction/runs`	Start an extraction run. Body: document reference(s), schema_id, options. Returns run_id.
GET	`/extraction/runs/{run_id}`	Get run status and (when complete) results: smart bites, metadata, source lineage.
POST	`/extraction/batch`	Submit a batch of documents; get job_id. Poll or webhook for completion and per-document status.

Request body for runs typically includes document (URL, base64, or storage reference) and schema_id. Response includes run status and, when completed, an array of smart bites conforming to the schema. See Extraction runs and Extraction overview.

Vector Catalog and search

Method	Path (conceptual)	Description
POST	`/search`	Semantic search. Body: collection_id, query (text or embedding), limit, optional metadata filters. Returns ranked smart bites with metadata and source lineage.
GET	`/collections`	List collections.
POST	`/collections`	Create collection. Body: name, schema_id (optional), indexing config.
GET	`/collections/{id}`	Get collection config.

Search request should include at least collection_id and query (or embedding). Use metadata filters to scope results. See Vector Search overview and Filtering.

Schemas

Method	Path (conceptual)	Description
GET	`/schemas`	List schemas in the workspace.
GET	`/schemas/{id}`	Get schema by ID (and optional version).
POST	`/schemas`	Create schema. Body: name, fields (see Schema field reference).
PATCH	`/schemas/{id}`	Update schema (may create new version depending on product).

Schema body includes fields: array of { name, type, required, ... }. See Schema field reference and Schema Studio overview.

Workflows

Method	Path (conceptual)	Description
POST	`/workflows/{id}/runs`	Trigger a workflow run. Optional body: trigger params (e.g. source path). Returns run_id.
GET	`/workflows/runs/{run_id}`	Get workflow run status and step-level results.

See Workflows overview and Triggers & scheduling.

Connectors

Method	Path (conceptual)	Description
GET	`/connectors`	List source and destination connectors.
POST	`/connectors`	Create connector. Body: type (source/destination), name, config (credentials, scope, target).
PATCH	`/connectors/{id}`	Update connector config.

See Source connectors and Destination connectors. Connectors reference lists supported connectors and parameters.

Response and errors

Success — 200 or 201 with JSON body. List endpoints may be paginated (limit, offset or cursor). See product docs for exact shapes.
Errors — 4xx/5xx with JSON body: code, message, optional details. See Error codes and Error handling. 429 indicates rate limiting; respect Retry-After. See Rate limits.

Pricing and page calculation

API usage may be billed or metered by pages processed. Page calculation (typical):

PDF, PPTX, TIFF — One page = one page, slide, or image.
DOCX — If page metadata exists, use it; otherwise size-based rule (e.g. file size ÷ 100 KB).
Other file types — Pages = file size ÷ 100 KB (or as documented).
Non-file data — 100 KB of incoming data = one page.

See your plan and the Bundata Pricing page (or account dashboard) for exact pricing. See also Limits and quotas.

Idempotency

For mutation endpoints (e.g. start run, create connector), send an Idempotency-Key header (e.g. UUID) so duplicate requests don’t create duplicate resources. Exact header name and behavior are in product docs.

Edge cases and examples

Large documents — Use async extraction (run ID + poll or webhook); avoid sync timeouts. See Extraction runs.
Empty or missing fields — Schema optional fields may be absent in response; handle nulls in your code. See Schema field reference.
Search with no results — Search returns an empty array when no bites match; combine with metadata filters to narrow. See Vector Search filtering.
Rate limits — Respect 429 and Retry-After; use exponential backoff. See Rate limits and Error handling.

Next steps

REST API overview — Concepts and examples.
Authentication — API keys.
Error codes — Error reference.
Limits and quotas — Rate limits and quotas.