Drift detection

What drift detection watches

A labeled dataset is only useful while it reflects the work coming in. When the mix of values reviewers see starts to diverge from the mix they accepted in the past, the model that learned from that history begins to mislabel, and the quality of new data degrades quietly. Piprio watches for that divergence so a quality lead hears about it before it spreads through the dataset.

Once a day, Piprio compares two pictures of every active labeling spec in the account. The first picture is the accepted history: the values reviewers signed off on over the last 90 days. The second is what is arriving now: the values the model has proposed on new documents over the last 30 days. When a value's share of the work moves materially between those two pictures, Piprio raises an alert and routes it to the people who can act on it.

The comparison runs on choice fields, the ones where a value comes from a fixed set of options (pass or fail, the defect type, the inspection outcome). Free-text, numeric, and yes-or-no fields are not part of the comparison, because a shifting distribution is only meaningful where the options are countable. If a field has no incoming proposals in the window, it is skipped that day rather than flagged.

Default thresholds

Drift is measured as the gap, in percentage points, between a value's share of the accepted history and its share of the incoming work. The thresholds are plain and fixed:

A value whose share moves by more than 20 points raises a warning.
A value whose share moves by more than 40 points raises a critical alert.

A worked example: if "fail" accounted for 15 percent of the accepted inspections over the baseline window but now makes up 50 percent of what the model is proposing, that is a 35-point move, which raises a warning. The same field swinging far enough to clear 40 points would raise a critical alert instead. The product's own tests pin this 15-to-50 case as a warning, so the boundary behaves as described.

A second kind of alert fires when a value shows up in incoming work that never once appeared in the accepted history. That is always a warning. It only fires once some accepted history exists to compare against, so a brand-new field with no labeled past does not flood the account with alerts on its first day of use.

A volume-spike alert fires when a field's incoming volume jumps materially against its baseline. It watches ingestion rate rather than label content. When the count of documents pulled in over the trailing week runs more than one and a half times the average of the prior four weeks, Piprio raises the alert, but only after at least four full weeks of history exist, so a new account with thin data does not trip it. This signal points at the source, not the model: a sudden surge usually means something changed in the connector or the upstream pipeline, not in the labeling.

The thresholds above are plain percentage-point bands. A value's share moving over 20 points warns, and over 40 points is critical.

Configuring alerts

There is nothing to tune. The thresholds, the 90-day baseline window, the 30-day incoming window, the volume multiplier, and the four-week minimum are fixed across every account. A buyer evaluating Piprio against a build-it-yourself approach should read this as a deliberate simplicity: drift monitoring works the same way for every tenant on day one, with no dashboard of dials to calibrate and no per-team configuration to get wrong. Changing the bands is a product change, not a customer setting. If a procurement review needs configurable thresholds, raising it during evaluation puts the requirement on the record.

Alerts are stored per tenant, inside the isolated schema that holds the rest of the account's data, so one customer's drift signals are never visible to another. Piprio keeps at most one open alert per field per kind at a time. If the same drift condition is still true tomorrow, no second copy of the same alert is stacked on the first. A repeat run while an alert is still open produces nothing new, which the product's deduplication test confirms.

Rollup behavior

The daily run walks every account and every active labeling spec, catches and logs any single failure so one bad spec cannot stop the rest of the run, and reports how many specs it checked and how many new alerts it raised.

When a run turns up new alerts for a labeling spec, two things follow. A drift event is published to any subscribed webhooks, carrying the spec's name, the number of alerts, and which kinds of drift were found, so an external system can pick them up. Separately, the account owners get a notification.

The notification links straight to the alert when there is exactly one. When a single run raises one alert for a spec, the notification on the Action Feed deep-links to that specific alert on the quality dashboard. When a run raises several at once, the notification links to the full drift list instead, since there is no single row to point at.

On the dashboard itself, alerts roll up across every labeling spec in the account. Asking for the drift alerts with no filter returns the open alerts from every spec at once, and a quality lead can narrow to a single spec on demand. Results come back newest first, with critical alerts ahead of warnings and the spec name as the final tiebreaker, and the list can be filtered to just warnings or just critical alerts. Only open, unacknowledged alerts appear, so the list stays a working queue rather than a growing archive. The aggregation test confirms three alerts spread across two specs come back as three with no filter and as the right per-spec subset when one is named.

Acting on a drift alert

Every alert carries a plain-language message and the numbers behind it. A distribution-shift alert names which value moved, what its share was in the accepted history, what its share is now, and the size of the gap. A new-value alert names the value that appeared and how many incoming documents carried it. The message is written to be read by a person, so a quality lead can triage without pulling up the raw figures.

What the alert points toward depends on its kind. A distribution shift or a brand-new value usually means the population being sent for review no longer matches what the model learned from, so its proposals have started to drift. The fix is upstream of labeling: a quality lead checks whether the source documents, the intake process, or the model itself changed. A volume signal points at the connector rather than the model, so its message asks for a check of the connector for unexpected activity.

An alert is cleared by acknowledging it. Acknowledging stamps who acted and when, drops the alert off the open queue, and releases the hold that kept the same alert from repeating. If the same condition still holds, the next daily run can raise it again, so acknowledging is a "seen and handled" action, not a permanent mute.

Reading alerts and acting on them carry different access. Anyone on the account, down to a Viewer or Reviewer, can see the drift list, because awareness of quality signals belongs to everyone doing the work. Acknowledging an alert is reserved for the roles that own quality decisions: Senior Reviewer, Data Steward, Admin, and Owner. A plain Viewer or Reviewer can read an alert but cannot close it.