Key concepts

This page introduces the main concepts you need to use Bundata effectively: how accounts and workspaces are organized, how you access the platform, and how data flows from sources to smart bites.

Accounts and workspaces

A Bundata account is the top-level entity for billing and support. Your organization may have one or more workspaces depending on your plan.

A workspace is your environment for running pipelines, managing connectors, and accessing the Vector Catalog. Within a workspace you configure sources, define schemas, run extraction and enrichment, and send results to destinations. Multiple users can be invited to the same workspace with configurable access.

Authentication and access

API key

Use an API key to authenticate with the Bundata API. Create and manage API keys in the dashboard. Use them in REST requests (e.g. Authorization: Bearer <key>) for partitioning, enrichment, chunking, and batch processing. Keep keys secure and rotate them according to your security policy.

User and identity

User identities are typically tied to your account (e.g. email). Access to workspaces, connectors, and pipelines can be restricted by role so teams only see what they need.

Bundata interfaces

UI (Platform)

The Bundata Platform (web UI) lets you design and run pipelines without code. Use Extraction Studio to upload files, define schemas, and run extraction. Use Workflows to chain ingestion, transformation, and delivery. The UI is ideal for exploring data, iterating on schemas, and operating pipelines visually.

API

The Bundata API provides programmatic access to the same capabilities: partition, enrich, chunk, and embed. Use the API for automation, CI/CD, and custom integrations. See API Reference.

CLI and SDKs

Where available, CLI and SDKs wrap the API so you can script and integrate Bundata into your toolchain.

Data flow and pipeline concepts

Connectors (sources and destinations)

Source connectors pull documents from your storage and apps: S3, Azure Blob, Google Cloud Storage, SharePoint, Confluence, Google Drive, Salesforce, and many more. Destination connectors send smart bites and embeddings to vector stores, databases, or object storage. See Integrations overview.

Schema

A schema defines the structure of your output: which elements to extract, which metadata to attach, and how chunks are shaped. You define schemas in the UI (Schema Studio) or via API. Schemas keep your smart bites consistent and predictable for downstream RAG and agents.

Pipelines and workflows

A pipeline is the end-to-end path from source to destination: ingest → partition → enrich → chunk → (optionally) embed → deliver. In the UI, you build workflows that schedule and run these steps. Workflows can run on a schedule, on demand, or in response to new data.

Vector Catalog

The Vector Catalog is Bundata’s managed store for embeddings and indexed smart bites. After chunking and embedding, you can send results to the Vector Catalog for search and RAG. The catalog supports semantic search and integration with agents and applications.

Billing and usage

Billing is at the account level. Usage is often measured in pages processed (e.g. one PDF page = one page; for other file types, size-based rules apply). See API Reference and Sign up for plan details.

Next steps

Sign up — Create an account and choose a plan
Quickstart — Run your first pipeline
Integrations overview — Connectors and cloud deployments