Integrations overview

Bundata connects to your existing data sources, cloud storage, and downstream systems so you can ingest documents and deliver smart bites and embeddings where you need them. This page gives an overview of how to connect Bundata to the rest of your stack.

Data sources and connectors

Bundata supports 35+ source connectors so you can pull content from the tools your teams already use:

Cloud storage — Amazon S3, Azure Blob Storage, Google Cloud Storage
Collaboration and content — SharePoint, Confluence, Google Drive, Box
Business apps — Salesforce, Zendesk, and other CRM and support tools
Databases and data lakes — Where supported, connect to your existing data stores

Connectors are secure, scalable, and built for production. Configure them in the Bundata UI or via API. For cloud-specific setup, see the pages below.

File types and formats

Bundata supports 65+ file types, including:

Documents — PDF, DOCX, PPTX, HTML, Markdown
Images — TIFF, PNG, JPEG (with optional OCR and image description)
Emails and messaging — EML, MSG
Structured data — CSV, JSON, XML (where applicable)

Content is normalized into a consistent schema so downstream systems receive a uniform structure.

Cloud deployments

Documentation is organized by cloud provider. Choose the guide that matches your environment:

Platform	Description
Bundata on AWS	Ingest from S3, use IAM for access, and run Bundata in your AWS account or as a managed service.
Bundata on Azure	Connect to Azure Blob Storage, use Azure AD and managed identities, and integrate with the Microsoft ecosystem.
Bundata on Google Cloud	Use Google Cloud Storage, BigQuery, and GCP identity so document pipelines run natively on GCP.

Use the cloud switcher or sidebar to move between AWS, Azure, and Google Cloud docs.

Destinations

After processing, send results to:

Vector Catalog — Bundata’s managed store for embeddings and semantic search
Your vector store — Export to your own vector database via destination connectors
Object storage — Write smart bites and metadata back to S3, Azure Blob, or GCS
Databases and data warehouses — Where supported, push structured output for analytics and BI

API and custom integrations

Use the Bundata API to build custom integrations:

Trigger pipelines from your CI/CD or internal tools
Batch-process files from local or remote storage
Integrate with MCP, agents, and other AI tooling

Authentication uses API keys; see Key concepts and API Reference.

Next steps

Key concepts — Accounts, workspaces, schemas, and pipelines
Cloud: AWS · Azure · GCP — Provider-specific setup
API Reference — Programmatic access