API reference
This page is the reference for the Bundata REST API: endpoint groups, authentication, and how to use the API for extraction, Vector Catalog search, schemas, workflows, and connectors. For a conceptual overview and examples, see REST API overview.
Authentication
All requests require an API key in the header:
Authorization: Bearer <api_key>
Create and manage keys in the Bundata dashboard. See Authentication. Returns 401 if missing or invalid.
Base URL
Use the base URL for your plan and region (e.g. https://api.bundata.com or your in-VPC host). Paths below are relative to that base. Version prefix (e.g. /v1/) may apply; see your product docs.
Endpoint groups
Extraction
| Method | Path (conceptual) | Description |
|---|---|---|
| POST | /extraction/runs | Start an extraction run. Body: document reference(s), schema_id, options. Returns run_id. |
| GET | /extraction/runs/{run_id} | Get run status and (when complete) results: smart bites, metadata, source lineage. |
| POST | /extraction/batch | Submit a batch of documents; get job_id. Poll or webhook for completion and per-document status. |
Request body for runs typically includes document (URL, base64, or storage reference) and schema_id. Response includes run status and, when completed, an array of smart bites conforming to the schema. See Extraction runs and Extraction overview.
Vector Catalog and search
| Method | Path (conceptual) | Description |
|---|---|---|
| POST | /search | Semantic search. Body: collection_id, query (text or embedding), limit, optional metadata filters. Returns ranked smart bites with metadata and source lineage. |
| GET | /collections | List collections. |
| POST | /collections | Create collection. Body: name, schema_id (optional), indexing config. |
| GET | /collections/{id} | Get collection config. |
Search request should include at least collection_id and query (or embedding). Use metadata filters to scope results. See Vector Search overview and Filtering.
Schemas
| Method | Path (conceptual) | Description |
|---|---|---|
| GET | /schemas | List schemas in the workspace. |
| GET | /schemas/{id} | Get schema by ID (and optional version). |
| POST | /schemas | Create schema. Body: name, fields (see Schema field reference). |
| PATCH | /schemas/{id} | Update schema (may create new version depending on product). |
Schema body includes fields: array of { name, type, required, ... }. See Schema field reference and Schema Studio overview.
Workflows
| Method | Path (conceptual) | Description |
|---|---|---|
| POST | /workflows/{id}/runs | Trigger a workflow run. Optional body: trigger params (e.g. source path). Returns run_id. |
| GET | /workflows/runs/{run_id} | Get workflow run status and step-level results. |
See Workflows overview and Triggers & scheduling.
Connectors
| Method | Path (conceptual) | Description |
|---|---|---|
| GET | /connectors | List source and destination connectors. |
| POST | /connectors | Create connector. Body: type (source/destination), name, config (credentials, scope, target). |
| PATCH | /connectors/{id} | Update connector config. |
See Source connectors and Destination connectors. Connectors reference lists supported connectors and parameters.
Response and errors
- Success — 200 or 201 with JSON body. List endpoints may be paginated (
limit,offsetor cursor). See product docs for exact shapes. - Errors — 4xx/5xx with JSON body:
code,message, optionaldetails. See Error codes and Error handling. 429 indicates rate limiting; respectRetry-After. See Rate limits.
Pricing and page calculation
API usage may be billed or metered by pages processed. Page calculation (typical):
- PDF, PPTX, TIFF — One page = one page, slide, or image.
- DOCX — If page metadata exists, use it; otherwise size-based rule (e.g. file size ÷ 100 KB).
- Other file types — Pages = file size ÷ 100 KB (or as documented).
- Non-file data — 100 KB of incoming data = one page.
See your plan and the Bundata Pricing page (or account dashboard) for exact pricing. See also Limits and quotas.
Idempotency
For mutation endpoints (e.g. start run, create connector), send an Idempotency-Key header (e.g. UUID) so duplicate requests don’t create duplicate resources. Exact header name and behavior are in product docs.
Edge cases and examples
- Large documents — Use async extraction (run ID + poll or webhook); avoid sync timeouts. See Extraction runs.
- Empty or missing fields — Schema optional fields may be absent in response; handle nulls in your code. See Schema field reference.
- Search with no results — Search returns an empty array when no bites match; combine with metadata filters to narrow. See Vector Search filtering.
- Rate limits — Respect 429 and
Retry-After; use exponential backoff. See Rate limits and Error handling.
Next steps
- REST API overview — Concepts and examples.
- Authentication — API keys.
- Error codes — Error reference.
- Limits and quotas — Rate limits and quotas.