Bundata on AWS
This documentation provides guidance for running and integrating Bundata with Amazon Web Services (AWS). Bundata can ingest from and deliver to AWS storage and identity services so you can keep document intelligence within your AWS environment.
Overview
Bundata on AWS supports:
- Ingestion from S3 — Read documents from S3 buckets (and optionally other AWS data sources) for partitioning, extraction, and enrichment.
- Identity and access — Use IAM roles and credentials so Bundata accesses only the resources you allow.
- Deployment options — Run Bundata as a managed service that connects to your AWS account, or deploy in your own VPC for full control.
Documentation is organized by cloud provider. Use the main docs navigation to switch between AWS, Azure, and Google Cloud as needed.
Getting started
- Sign up for Bundata — Create an account and choose a plan that supports AWS connectivity.
- Configure an S3 source — In the Bundata UI or via API, add an S3 connector with your bucket name, region, and credentials (or IAM role).
- Run a pipeline — Define a schema, run extraction and enrichment, and send results to Vector Catalog, another S3 bucket, or a different destination.
Data and storage
- Source documents — Store raw PDFs, DOCX, and other files in S3. Bundata reads from the buckets and prefixes you configure.
- Output — Write smart bites, embeddings, and metadata back to S3, or to Bundata’s Vector Catalog and other supported destinations.
- Security — Use bucket policies and IAM to restrict access. Prefer IAM roles over long-lived access keys where possible.
Integration with AWS services
- Amazon S3 — Primary storage for source and (optionally) output data.
- IAM — Authentication and authorization for S3 and other AWS APIs used by Bundata.
- VPC — For in-VPC or private-link deployments, Bundata can run inside your VPC for low-latency and locked-down access.
For detailed networking, compliance, and region options, refer to your Bundata account documentation or contact support.
Try Bundata on AWS
- Quickstart — Run your first pipeline with an S3 source.
- API Reference — Configure connectors and batch jobs via API.
- Overview — Key functionality and use cases.
Other clouds
- Bundata on Azure — Microsoft Azure Blob Storage and Azure-native integration.
- Bundata on Google Cloud — Google Cloud Storage and GCP integration.