Open Source vs Enterprise

Bundata offers both an open source library and a production-ready UI and API. The open source library is designed for quick prototyping and has limits; for production scenarios, use the Bundata UI or API.

When to use which

Use caseOpen source libraryBundata UI / API
Quick prototypingYesYes
Production workloadsNoYes
Strict compliance (SOC 2, HIPAA, etc.)NoYes
Best extraction/chunking performanceNoYes
No-code or low-code pipelinesNoYes (UI)
Managed infra and dependenciesNoYes

Open source library: strengths and limits

The Bundata open source library gives you:

  • Local and flexible — Run partitioning and basic pipelines on your own machine or infrastructure.
  • Modular — Use only the pieces you need (e.g. partitioning, cleaning, chunking).
  • Connectors — Many source and destination connectors are available in the open source offering.

Limitations compared to the UI/API include:

  • Not designed for production — No SLAs, limited scaling, and you operate and maintain the stack.
  • Lower performance — Document and table extraction are slower and less accurate than in the UI/API.
  • No access to latest models — No access to Bundata’s newest vision/OCR and fine-tuned models.
  • Limited chunking — No by-page or by-similarity chunking strategies; fewer options for optimal RAG.
  • No built-in embeddings — In the core library, embeddings are not built in (you can add them as a separate step or use the ingest CLI/library where supported).
  • No enrichment types — No image descriptions, table descriptions, or NER in the core open source offering.
  • No compliance — No SOC 2, HIPAA, GDPR, ISO 27001, FedRAMP, or similar; you are responsible for compliance.
  • No auth / identity — No authentication or identity management in the core open source offering for local processing.
  • No incremental loading — No built-in incremental data loading.
  • No job scheduling or monitoring — No ETL job scheduling or monitoring out of the box.
  • No image extraction — No image extraction from documents in the core library.
  • Weaker hierarchy detection — Less sophisticated document hierarchy detection.
  • You manage dependencies — You must install and manage libraries such as Poppler and Tesseract.
  • You manage infrastructure — For local processing, you provide and maintain your own infrastructure.

For production, compliance, best performance, and managed operations, use the Bundata UI or API.

Bundata UI and API: benefits

The Bundata UI and API provide:

  • Production use — Designed for reliability, scale, and operational support.
  • Higher performance — Better document and table extraction and faster transformation (e.g. multi-node serving and auto-scaling where applicable).
  • Latest models — Access to newer vision and OCR models and fine-tuned extraction.
  • Advanced chunking — By-page and by-similarity chunking, summary generation, and structured data generation.
  • Automatic selection — Automatic embedding and chunking logic selection (e.g. recommender) so you start with a strong retrieval setup.
  • Enrichment — Image descriptions, table descriptions, NER, and other enrichment types.
  • Security and compliance — SOC 2 Type 2, HIPAA, GDPR, ISO 27001, FedRAMP, and similar, where applicable. See the Bundata Trust Portal.
  • Auth and identity — Single sign-on (SSO) and identity management.
  • Incremental loading — Incremental data loading where supported.
  • Image extraction — Extract images from documents.
  • Smarter hierarchy — More sophisticated document hierarchy detection.
  • Managed dependencies — Bundata manages libraries like Tesseract and runtime dependencies.
  • Managed infrastructure — Bundata manages infrastructure, parallelization, and scaling.
  • UI and API — Use the no-code UI or the API for automation and integration.
  • Billing and usage — Real-time billing and usage dashboard where applicable.

Quickstart for open source

To try the open source library quickly, see the open source quickstart (if you use the library in your repo or a separate docs section). For production, start with the Quickstart using the Bundata UI or API.

Summary

  • Prototyping and learning — Open source library is a good starting point.
  • Production, compliance, and best performance — Use the Bundata UI or API.

For more detail on limits and features, see the Overview and API Reference.