Bundata on Google Cloud
This documentation provides guidance for running and integrating Bundata with Google Cloud Platform (GCP). Bundata can ingest from and deliver to Google Cloud Storage and other GCP services so you can keep document intelligence within your GCP environment.
Overview
Bundata on GCP supports:
- Ingestion from Cloud Storage — Read documents from GCS buckets for partitioning, extraction, and enrichment.
- Google Cloud identity — Use service accounts and IAM so Bundata accesses only the resources you allow.
- Deployment options — Run Bundata as a managed service that connects to your GCP project, or deploy in your own VPC for full control.
Documentation is organized by cloud provider. Use the main docs navigation to switch between AWS, Azure, and Google Cloud as needed.
Getting started
- Sign up for Bundata — Create an account and choose a plan that supports GCP connectivity.
- Configure a GCS source — In the Bundata UI or via API, add a Google Cloud Storage connector with your bucket name and credentials (or service account).
- Run a pipeline — Define a schema, run extraction and enrichment, and send results to Vector Catalog, another GCS bucket, BigQuery, or a different destination.
Data and storage
- Source documents — Store raw PDFs, DOCX, and other files in Google Cloud Storage. Bundata reads from the buckets and prefixes you configure.
- Output — Write smart bites, embeddings, and metadata back to GCS, or to Bundata’s Vector Catalog and other supported destinations (e.g. BigQuery for analytics).
- Security — Use service account keys or workload identity with least-privilege IAM roles. Prefer workload identity where possible.
Integration with GCP services
- Google Cloud Storage — Primary storage for source and (optionally) output data.
- IAM — Authentication and authorization for GCS and other GCP APIs used by Bundata.
- BigQuery — Optionally export structured output or embeddings for analytics and ML.
- VPC — For in-VPC or private connectivity, Bundata can run inside your VPC for low-latency and locked-down access.
For detailed networking, compliance, and region options, refer to your Bundata account documentation or contact support.
Try Bundata on GCP
- Quickstart — Run your first pipeline with a GCS source.
- API Reference — Configure connectors and batch jobs via API.
- Overview — Key functionality and use cases.
Other clouds
- Bundata on AWS — Amazon S3 and AWS-native integration.
- Bundata on Azure — Azure Blob Storage and Azure-native integration.