Quickstart: from PDF to vector-ready intelligence

This guide gets you from zero to your first extraction in a few steps. You can use the Platform UI or the API. By the end you will have run context-aware extraction on a document and produced smart bites ready for the Vector Catalog, RAG, or agents.

Overview

Bundata turns unstructured documents into vector-ready intelligence. The quickstart path: sign up, optionally connect a source, define a schema, run extraction, then use the output (e.g. send to the Vector Catalog or run semantic search). Key concepts you will touch: smart bites, schema-aware extraction, source lineage, and extraction confidence.

Create a Bundata account and sign in. From the dashboard, open Extraction Studio or the Platform to build pipelines.

UI — Upload documents, define schemas, and run extraction from the web interface.
API — Use your API key to call the Bundata API for partitioning, enrichment, and embedding. See API reference and REST API overview.

2. Connect a source (optional)

Connect a data source—S3, Google Drive, SharePoint, or upload files directly. Bundata supports 35+ connectors.

In the UI: Use the source picker in Extraction Studio or Workflows.
Via API: Use the ingest or batch endpoints with your source credentials.

3. Define a schema

Define what you want to extract:

Infer from sample JSON — Provide an example output; Bundata can suggest a schema.
Schema Studio — Design and edit schemas in the UI with validation and preview.
API — Create or update schemas programmatically.

Schemas define the structure of your smart bites (metadata, entities, tables).

4. Run extraction

Run extraction on your documents. Bundata will:

Partition and optionally chunk the content.
Extract fields according to your schema.
Apply enrichments (metadata, NER, image/table descriptions if enabled).
Output structured, vector-ready smart bites.

In the UI, use Run Extraction. Via API, use the batch processing endpoints.

5. Use the output

Send results to:

Vector Catalog — Index smart bites for search and RAG.
Agents — Feed structured context to agentic AI.
Workflows — Trigger downstream actions or export to your systems.

Next steps

First extraction — Step-by-step first extraction.
First schema — Create your first schema in Schema Studio.
Vector Search overview — Query with semantic search.