Integrations overview

Bundata connects to your existing data sources, cloud storage, and downstream systems so you can ingest documents and deliver smart bites and embeddings where you need them. This page gives an overview of how to connect Bundata to the rest of your stack.

Data sources and connectors

Bundata supports 35+ source connectors so you can pull content from the tools your teams already use:

  • Cloud storage — Amazon S3, Azure Blob Storage, Google Cloud Storage
  • Collaboration and content — SharePoint, Confluence, Google Drive, Box
  • Business apps — Salesforce, Zendesk, and other CRM and support tools
  • Databases and data lakes — Where supported, connect to your existing data stores

Connectors are secure, scalable, and built for production. Configure them in the Bundata UI or via API. For cloud-specific setup, see the pages below.

File types and formats

Bundata supports 65+ file types, including:

  • Documents — PDF, DOCX, PPTX, HTML, Markdown
  • Images — TIFF, PNG, JPEG (with optional OCR and image description)
  • Emails and messaging — EML, MSG
  • Structured data — CSV, JSON, XML (where applicable)

Content is normalized into a consistent schema so downstream systems receive a uniform structure.

Cloud deployments

Documentation is organized by cloud provider. Choose the guide that matches your environment:

PlatformDescription
Bundata on AWSIngest from S3, use IAM for access, and run Bundata in your AWS account or as a managed service.
Bundata on AzureConnect to Azure Blob Storage, use Azure AD and managed identities, and integrate with the Microsoft ecosystem.
Bundata on Google CloudUse Google Cloud Storage, BigQuery, and GCP identity so document pipelines run natively on GCP.

Use the cloud switcher or sidebar to move between AWS, Azure, and Google Cloud docs.

Destinations

After processing, send results to:

  • Vector Catalog — Bundata’s managed store for embeddings and semantic search
  • Your vector store — Export to your own vector database via destination connectors
  • Object storage — Write smart bites and metadata back to S3, Azure Blob, or GCS
  • Databases and data warehouses — Where supported, push structured output for analytics and BI

API and custom integrations

Use the Bundata API to build custom integrations:

  • Trigger pipelines from your CI/CD or internal tools
  • Batch-process files from local or remote storage
  • Integrate with MCP, agents, and other AI tooling

Authentication uses API keys; see Key concepts and API Reference.

Next steps