Knowledge assistants
Knowledge assistants are AI copilots that answer questions and perform tasks using your documents—contracts, policies, procurement records, legal filings. Bundata supplies the document intelligence layer: context-aware extraction, smart bites in the Vector Catalog, and Vector Search so the assistant can retrieve relevant context and deliver grounded answers with source lineage.
What you need from Bundata
- Extraction — Documents turned into schema-aware chunks with metadata and optional enrichment (NER, table description). See Extraction overview.
- Vector Catalog — Indexed smart bites and embeddings. See Vector Catalog overview.
- Vector Search — Semantic retrieval so the assistant gets the right chunks for each user question. See Vector Search overview and Grounding from catalog.
Use cases
- Contract assistant — “What’s the termination clause in this agreement?” Query the catalog for the contract, retrieve relevant sections, and ground the answer with citations.
- Policy assistant — “What is our refund policy?” Search policy docs, return matching sections, and let the LLM summarize with source links.
- Legal / compliance — “Which contracts mention liability cap X?” Filter by metadata and search; return list with source lineage for review.
- Procurement / ops — “What did we agree with vendor Y?” Search contracts and invoices; ground answers in extracted terms and line items.
Design patterns
- Index by use case — Use separate collections (or strong metadata) for contracts, policies, invoices so the assistant can scope queries. See Collections.
- Retrieve then generate — Run Vector Search with the user’s question, pass top-k smart bites to the LLM, and generate an answer. Always attach source lineage to the response. See Grounding from catalog.
- Filter when possible — Use metadata (date, document type, source) so retrieval is relevant. See Vector Search: Filtering.
- Confidence and quality — Prefer high-extraction confidence bites; filter or flag low-confidence content so the assistant doesn’t ground on unreliable text. See Extraction best practices.
Common pitfalls
- Stale catalog — Keep ingestion updated with workflow orchestration so new and updated docs are searchable.
- No citations — Always show which document and (if possible) section each part of the answer came from.
- One-size-fits-all collection — Use multiple collections or strong metadata so the assistant doesn’t mix unrelated doc types.
Next steps
- Grounding from catalog — Retrieve and pass context to the LLM.
- First agent — Build your first agent.
- Vector Search best practices — Relevance and troubleshooting.