Quickstart: from PDF to vector-ready intelligence
This guide gets you from zero to your first extraction in a few steps. You can use the Platform UI or the API. By the end you will have run context-aware extraction on a document and produced smart bites ready for the Vector Catalog, RAG, or agents.
Overview
Bundata turns unstructured documents into vector-ready intelligence. The quickstart path: sign up, optionally connect a source, define a schema, run extraction, then use the output (e.g. send to the Vector Catalog or run semantic search). Key concepts you will touch: smart bites, schema-aware extraction, source lineage, and extraction confidence.
1. Sign up and open the platform
Create a Bundata account and sign in. From the dashboard, open Extraction Studio or the Platform to build pipelines.
- UI — Upload documents, define schemas, and run extraction from the web interface.
- API — Use your API key to call the Bundata API for partitioning, enrichment, and embedding. See API reference and REST API overview.
2. Connect a source (optional)
Connect a data source—S3, Google Drive, SharePoint, or upload files directly. Bundata supports 35+ connectors.
- In the UI: Use the source picker in Extraction Studio or Workflows.
- Via API: Use the ingest or batch endpoints with your source credentials.
3. Define a schema
Define what you want to extract:
- Infer from sample JSON — Provide an example output; Bundata can suggest a schema.
- Schema Studio — Design and edit schemas in the UI with validation and preview.
- API — Create or update schemas programmatically.
Schemas define the structure of your smart bites (metadata, entities, tables).
4. Run extraction
Run extraction on your documents. Bundata will:
- Partition and optionally chunk the content.
- Extract fields according to your schema.
- Apply enrichments (metadata, NER, image/table descriptions if enabled).
- Output structured, vector-ready smart bites.
In the UI, use Run Extraction. Via API, use the batch processing endpoints.
5. Use the output
Send results to:
- Vector Catalog — Index smart bites for search and RAG.
- Agents — Feed structured context to agentic AI.
- Workflows — Trigger downstream actions or export to your systems.
Next steps
- First extraction — Step-by-step first extraction.
- First schema — Create your first schema in Schema Studio.
- Vector Search overview — Query with semantic search.