Schema Studio overview
Schema Studio is the Bundata UI for designing and managing schemas: the definitions that control what schema-aware extraction produces. Schemas define field names, types (text, number, date, object, array), required vs. optional fields, and nested structure so your smart bites are consistent and usable by the Vector Catalog, RAG, and agents. This page explains what Schema Studio is, when to use it, how it fits the rest of the platform, and how to avoid common mistakes.
What Schema Studio is
Schema Studio is the visual editor for schemas. You can:
- Create and edit schemas — Add fields, set types, mark required or optional, and build nested objects and arrays.
- Preview and test — Run extraction on a sample document and see how output maps to your schema. Use this to iterate before scaling.
- Version schemas — Save versions so extraction runs are tied to a specific schema; supports source lineage and reprocessing. See Versioning.
Schemas are the contract between your documents and downstream systems: the Vector Catalog, your app, or an agent. Good schema design improves extraction confidence and makes smart bites reliable for search and grounded answers.
When to use Schema Studio
- Designing a new schema — Start in the UI so you can see field types and structure at a glance. Use for contracts, invoices, policy docs, and operational records.
- Iterating on an existing schema — Add optional fields, adjust types, or introduce nesting. Test with sample docs before rolling out. See Schema design.
- Reviewing and versioning — Before a big change, save the current schema as a version so you can compare or roll back. See Versioning.
For automation (CI/CD, bulk schema creation), use the API to create or update schemas; the same field types and rules apply. See Schema field reference and REST API overview.
How it fits the rest of Bundata
- Extraction — Every extraction run uses a schema. The run produces smart bites that conform to that schema. See Extraction overview and Extraction runs.
- Vector Catalog — Collections can be tied to a schema so ingested content is validated and metadata is consistent. See Vector Catalog: Collections.
- Workflows — Workflow steps that run extraction reference a schema (by ID or version). See Workflows overview.
Schema Studio is the central place to define and evolve that contract so the whole document intelligence layer stays consistent.
Field types and structure
- Primitives — Text, number, date, boolean. Use for titles, amounts, dates, and flags. See Field types.
- Objects — Nested key-value structure (e.g. address, party). Use when the document has a clear block of related fields.
- Arrays — Lists of values or objects (e.g. line items, sections). Use for repeated blocks. See Schema design for minimal examples (contracts, invoices).
Required vs. optional affects run success and extraction confidence. Prefer optional for fields that may be absent. See Schema design.
Common mistakes
- Too many required fields — Causes run failures when documents omit data. Prefer optional and validate downstream if needed.
- Schema doesn’t match document type — Using an invoice schema on contracts (or vice versa) hurts confidence and completeness. Use one schema per document type.
- Skipping versioning — Changing a schema without versioning makes it hard to know which runs used which structure and to reprocess consistently. See Versioning.
- No testing — Always run extraction on a few sample documents and inspect output before scaling. Use Schema Studio’s preview or the extraction API.
Next steps
- Schema design — Principles and real-world examples.
- Field types — All types and when to use them.
- Versioning — Schema versions and runs.
- Extraction overview — How extraction uses your schema.
- First schema — Build your first schema step-by-step.