Quickstart: from PDF to vector-ready intelligence

This guide gets you from zero to your first extraction in a few steps. You can use the Platform UI or the API. By the end you will have run context-aware extraction on a document and produced smart bites ready for the Vector Catalog, RAG, or agents.

Overview

Bundata turns unstructured documents into vector-ready intelligence. The quickstart path: sign up, optionally connect a source, define a schema, run extraction, then use the output (e.g. send to the Vector Catalog or run semantic search). Key concepts you will touch: smart bites, schema-aware extraction, source lineage, and extraction confidence.

1. Sign up and open the platform

Create a Bundata account and sign in. From the dashboard, open Extraction Studio or the Platform to build pipelines.

  • UI — Upload documents, define schemas, and run extraction from the web interface.
  • API — Use your API key to call the Bundata API for partitioning, enrichment, and embedding. See API reference and REST API overview.

2. Connect a source (optional)

Connect a data source—S3, Google Drive, SharePoint, or upload files directly. Bundata supports 35+ connectors.

  • In the UI: Use the source picker in Extraction Studio or Workflows.
  • Via API: Use the ingest or batch endpoints with your source credentials.

3. Define a schema

Define what you want to extract:

  • Infer from sample JSON — Provide an example output; Bundata can suggest a schema.
  • Schema Studio — Design and edit schemas in the UI with validation and preview.
  • API — Create or update schemas programmatically.

Schemas define the structure of your smart bites (metadata, entities, tables).

4. Run extraction

Run extraction on your documents. Bundata will:

  1. Partition and optionally chunk the content.
  2. Extract fields according to your schema.
  3. Apply enrichments (metadata, NER, image/table descriptions if enabled).
  4. Output structured, vector-ready smart bites.

In the UI, use Run Extraction. Via API, use the batch processing endpoints.

5. Use the output

Send results to:

  • Vector Catalog — Index smart bites for search and RAG.
  • Agents — Feed structured context to agentic AI.
  • Workflows — Trigger downstream actions or export to your systems.

Next steps