Build your first agent

This tutorial walks you through connecting an agent to Bundata so it can deliver grounded answers using your documents. Agentic AI needs more than raw data — it needs enriched, structured information. Bundata prepares your enterprise data through RAG enablement, structured data extraction, and metadata enrichment so agents can reason over schema-aware outputs and vector-ready smart bites with source lineage.

Why agents need Bundata

Bundata powers agent workflows by:

  • RAG enablement — Turning raw documents into clean, contextually aware chunks optimized for retrieval-augmented generation (RAG), so your agents get the right context at the right time.
  • Structured data extraction — Extracting key document elements, tables, and entities from contracts, invoices, and operational records into structured JSON for downstream processing and autonomous agents.
  • File metadata enrichment — Enriching every file with source details, timestamps, and semantic tags so agentic systems can make confident, traceable decisions.

Agents built on Bundata are grounded by structured outputs: they reason over schema-aware fields and smart bites, not just messy raw text. Use cases include AI copilots (customer support, legal review, R&D), agents that reason across multiple documents, and internal onboarding assistants. Product overview: Product → Agents. Platform: Platform → Agents.

Prerequisites

  • A Bundata account and access to the platform. See Sign up and Quickstart.
  • Content already in the Vector Catalog (smart bites from extraction and ingestion). If not, complete First extraction and ingestion so at least one collection has data.
  • (Optional) An LLM or agent framework you will call from your app; this tutorial focuses on the Bundata side (catalog and search).

Step 1: Index content into the Vector Catalog

  1. Run extraction on your documents (contracts, invoices, policies, operational docs) using a schema so output is schema-aware. See First extraction.
  2. Ensure extraction or ingestion writes to a Vector Catalog collection. Collections hold the smart bites and embeddings that Vector Search queries. See Vector Catalog overview and Collections.
  3. Confirm the collection has data (e.g. via the platform at /platform/catalog or the API). Without indexed content, search returns nothing and the agent has no context.
  1. In your application or agent framework, configure the agent to retrieve context from Bundata instead of (or in addition to) other sources.
  2. Point the agent at the Bundata search API or at your app’s search layer that queries the catalog. Use the search endpoint with the appropriate collection ID and optional metadata filters. See Vector Search overview and API reference.
  3. For each user question, your flow will: call Bundata search with the question (or an embedding), get ranked smart bites, and pass the top chunks to the LLM as context. See Grounding from catalog.

You can use the platform Agents UI at Platform → Agents to configure and test an agent that uses the catalog.

Step 3: Retrieve context for each question

  1. When the user asks a question, call the Vector Search API (or your app’s search layer) with the question text or an embedding.
  2. Request a limited number of top results (e.g. 5–10) so the LLM receives a focused context window.
  3. Pass the returned smart bites (text + metadata) to your LLM as context for generating the answer. The model can now produce grounded answers that cite the retrieved content.

In Bundata, search results include source lineage (document and run identifiers) so you can trace answers back to the original file.

Step 4: Surface source lineage in responses

  1. Include source lineage in the agent’s response (e.g. “Based on document X, section Y”) so users can verify answers and auditors can trace decisions. See Agents grounding.
  2. Optionally store or display document IDs, run IDs, or links so users can open the source document. This builds trust and supports compliance.

What happens behind the scenes

Bundata does not run the LLM; it provides the document intelligence layer: extraction produces smart bites, the Vector Catalog stores and indexes them, and Vector Search returns relevant chunks with metadata and lineage. Your agent combines that context with the user question, calls the LLM, and returns a grounded answer. Outputs are ready for production: connect agents to the catalog, workflows, search, and downstream delivery systems. See Agents overview and Core capabilities on the website.

Common mistakes

  • Searching an empty collection — Ensure extraction and ingestion have run and written to the catalog before pointing the agent at search.
  • Ignoring source lineage — Always surface source and document in the UI or response so answers are traceable and trustworthy.
  • No metadata filters — Use metadata filters (e.g. document type, date) to scope search so the agent gets the most relevant context (e.g. only contracts, or only recent policies).

Next steps