Source connectors
Source connectors pull documents from your storage and apps into Bundata for context-aware extraction. Bundata supports 35+ source connectors including cloud storage, collaboration tools, and business applications. Configure them in the Platform or via API; use them in workflows for workflow orchestration and continuous ingestion.
Cloud storage
| Connector | Use for |
|---|---|
| Amazon S3 | Buckets and prefixes; IAM or key-based auth. See AWS. |
| Azure Blob Storage | Containers; key or managed identity. See Azure. |
| Google Cloud Storage | Buckets; service account or key. See GCP. |
Use for contracts, invoices, and policy docs stored in object storage. Trigger workflows on a schedule or (where supported) on new object events.
Collaboration and content
| Connector | Use for |
|---|---|
| SharePoint | Sites, libraries, folders. OAuth or app registration. |
| Confluence | Spaces and pages. API token or OAuth. |
| Google Drive | My Drive and shared drives. OAuth. |
| Box | Folders and files. OAuth. |
Use for internal policies, playbooks, and shared documents. Sync on schedule or via events where supported.
Business apps
| Connector | Use for |
|---|---|
| Salesforce | Attachments, content versions. OAuth. |
| Zendesk | Tickets, articles. API or OAuth. |
Use for support tickets, contract attachments, and knowledge-base content. See Reference: Connectors for the full list and parameters.
Configuration
- Credentials — Each connector has an auth method (API key, OAuth, IAM). Store credentials securely; use least-privilege access. See Authentication.
- Scope — Configure which folders, buckets, or objects to read. Limit scope to reduce cost and improve security.
- File types — Connectors typically respect supported file types. Unsupported files are skipped or reported; check run logs.
Workflows
- Attach a source connector to a workflow so extraction runs on new or updated documents. See Triggers & scheduling.
- Ensure source lineage (document ID, path) is preserved so smart bites in the Vector Catalog are traceable. See Vector Catalog: Lineage.
Common pitfalls
- Expired or invalid credentials — Connector runs fail; rotate keys and update configuration. Monitor run history. See Monitoring.
- Too broad scope — Ingesting entire tenants or buckets can be slow and expensive. Start with a folder or prefix and expand as needed.
- Unsupported file types — Filter at source or handle “skipped” files in run results; see Supported file types.
Next steps
- Destination connectors — Where to send output.
- Reference: Connectors — Full list and parameters.
- Workflows overview — Orchestrate ingestion and extraction.