Schemas
A schema is the labeling spec for one type of document. It defines what a reviewer captures when they label an artifact: the questions to answer, the answer types, which answers are mandatory, and the rules that decide whether a submission holds together. Most data programs end up with several schemas, one per document family, because a maintenance report and a quality certificate ask for different things.
A schema can start from a blank slate, be cloned from an existing one, or be seeded from a prebuilt industry template so a new document type is ready to label in minutes rather than days. From there, the rest of this page covers what a designer controls: the answer types available, how mandatory answers are enforced, the validation rules that run at submit time, and how a schema changes over its life without disturbing the work already done under it.
Field types
A schema designer builds the labeling form from seven answer types. Each one constrains what a reviewer (or the AI extractor) can enter, so the data that lands is typed at the source instead of cleaned up later.
- Choice list, where the answer must come from a fixed set the designer defines, such as Pass or Fail.
- Yes or no, for a single binary judgment.
- Whole number, for counts and integer measurements.
- Decimal number, for measured values that carry a fractional part.
- Rating scale, a bounded score such as 1 to 5, kept distinct from a plain number so the allowed range is part of the field rather than something a reviewer has to remember.
- Free text, for open notes and descriptions.
- Date.
The set is fixed at seven, and it is enforced at the storage layer, not just in the interface. A value that does not match its field's type is rejected before it is ever written, so a malformed answer cannot reach the database through a stray integration or a hand-built request. Every field also carries a human-readable label and an optional hint, so the person doing the work sees plain guidance rather than an internal code.
Required and optional fields
Each field is either mandatory or optional. The default is optional, so a designer marks the answers that matter as mandatory and leaves the rest free.
Mandatory does not mean a reviewer can never set work aside. A label can be saved as a draft with gaps. Mandatory enforcement runs only at submit, the moment the work is offered for review, and it blocks the submission until every required answer is present. The response names exactly which required answers are still missing, so the interface can point the reviewer straight at the gaps rather than making them hunt.
A field with a default value counts as answered. If a reviewer never touches it, the default fills in and the captured value is flagged as a default rather than a deliberate entry, so a mandatory field with a sensible default does not stall a submission and the audit trail still shows the value was not hand-entered.
Turning an optional field into a mandatory one is a change to the spec, not an edit to a single field. Labels already captured may have no answer for that field, so making it mandatory would retroactively break them. A change like that goes through a new version, covered below.
Validation rules
Type enforcement keeps each answer well-formed on its own. Validation rules check answers against each other and against limits, and like mandatory enforcement they run only at submit. A draft can be saved in a state that would fail these rules. Submission is the gate.
Three kinds of rule are available.
- Conditional requirement. A field becomes mandatory only when another field holds a specific value. A failure-mode field can be left blank in general but required the moment the outcome is set to Fail. This avoids cluttering the form with fields that only matter in some cases.
- Range limit. A numeric answer must fall between a lower and an upper bound, either of which is optional. An answer outside the range is rejected, and an answer that is not a number at all is caught with a clear message.
- Format check. A free-text answer must match a defined pattern, useful for part numbers, serial codes, or any value with a fixed shape. A blank answer is not treated as a format failure, so a format check does not quietly double as a mandatory rule.
Each rule carries the message shown to the reviewer when it fails, so the wording can be set in the customer's terms. When a submission trips one or more rules, the response lists every field that failed and why, so the reviewer fixes them in one pass instead of submitting repeatedly. The same checks run when an existing label is edited and resubmitted.
Schema versions
A schema is not a single document that gets overwritten. It owns an ordered chain of versions, numbered from one, and each version is a frozen snapshot of the fields and rules as they stood. The first version is created with the schema itself.
This is the design decision that matters most to an auditor or an ML team. Every label is pinned to the exact version it was captured under, and that link never moves. A label captured under version 3 stays a version-3 label for life. Publishing version 4 does not re-interpret, re-validate, or re-tag a single piece of earlier work. Past labels keep meaning what they meant when a human signed off on them, which is what makes a labeled dataset defensible months later.
Small, non-breaking adjustments do not need a new version at all. The label shown to reviewers, the hint text, the default value, and the display order of a field can be changed in place on the current version. Concurrent edits are detected and surfaced before they overwrite each other, so two people tuning the same schema cannot silently clobber one another's work. What cannot change in place is anything that would alter the meaning of existing answers: a field's type, whether it is mandatory, or the question it asks. Those are breaking, and they go through a new version.
Migrating between versions
Publishing a new version is an explicit act. The new version starts as a full copy of the current one, then the requested changes are applied in order: add a field, change a field, or remove a field. The version number increments automatically, so numbering stays clean and gap-free. Each new version carries a short summary of what changed, and that summary, along with who published it and when, is recorded in the audit log in the same step as the change, so the version history is a reliable record rather than a best-effort note.
Validation rules move forward with the schema. A new version inherits the previous version's rules unless the designer supplies a replacement set, in which case the new rules apply from that version onward. Older versions keep the rules they were published with.
Promotion is forward-only in effect. The schema's current version moves to the new one, so all new labeling work picks it up automatically, while every label already captured stays on the version it was tagged under. No reviewer has to redo finished work, and no historical dataset shifts under a team's feet because a field was added upstream.
Schema deactivation
There is no hard delete for a schema. A schema that has been used to label real work is part of the provenance of that work, and deleting it would orphan the labels that point at its versions. Instead a schema is retired, and retirement is reversible.
Retirement is also guarded. A schema cannot be retired while it is still doing active work. If any live connector is set to label incoming documents with this schema, or any active routing rule sends work to it, the request is refused and the response names the specific connectors and rules that are still pointing at it, so an admin can unwind those bindings first. Inactive bindings and rules are ignored, since they are not routing anything. When a retirement is combined with other edits in the same request, the guard runs first, so the request either fully succeeds or fully fails with nothing half-applied.
A retired schema drops out of the normal schema picker so it cannot be chosen for new work by mistake, though it can still be listed on demand for review. Labels that reference its versions stay fully queryable. Bringing it back is a single step with no guard, and retiring an already-retired schema or reactivating an already-active one does nothing rather than erroring, so the controls are safe to call more than once.