Quickstart

This page traces the shortest path from a finished install to a first labeled artifact: sign in, connect a source, define a labeling spec, accept a proposed label, and pull the result out as a dataset. It assumes the guided installer has already run and reported success. For the install itself, see docs/getting-started/self-hosted-install.md. For the vocabulary used below, including what an artifact and a schema are, see docs/getting-started/concepts.md.

The ten-minute figure is a goal for the happy path, not a guarantee. It holds when AI assistance was configured at install, the first document is a small file that arrives in seconds, and a prebuilt schema template fits the document type. Larger files take longer, and files above the AI-extraction size limit go straight to a person with no proposal to accept, which the steps below note where it matters.

Before starting

Three things need to be true before the first sign-in.

The installer has finished and printed a pass at every stage. If that is in doubt, the bundled health check confirms every service is up and answering.

bash scripts/health-check.sh

The login URL, the first administrator's email, and the one-time password it generated are saved. The installer prints all three on its final screen and shows the password only once. The first sign-in uses those credentials.

A source the administrator can reach is in mind: an object-storage bucket, a network share, or a document-management system that holds a few files of one kind. The connector built in the next step points at it.

Piprio has no public sign-up. The organization, the per-customer account that owns every document, schema, person, and connector, was created during install along with the first administrator. There is no separate account-creation step. Opening the login URL and signing in with the printed credentials lands the administrator inside that organization.

The first sign-in changes the one-time password to a real one, then drops the administrator into a short guided setup that walks through the next three steps in order. The setup can be left at any point and the same work picked up later from the main navigation. An administrator who would rather work from the full pages can skip the guide and go straight to the source-connector, schema, and review surfaces.

Connect a source

A connector is a read-only link to where the documents already live. Piprio does not copy files into itself. It records the path and reads the bytes on demand, so the originals stay in the customer's systems. The setup's first step creates one connector. The administrator picks the source type, supplies the location and the access credentials, and saves. Credentials are encrypted before they are written and are read back only at crawl time.

Once the connector is saved, Piprio scans the source and turns each matching file into an artifact, the unit a reviewer labels. A small source of a few files scans in seconds. The full set of source types, their authentication, and how repeat scans avoid duplicates are covered in docs/data/connectors.md.

Define a schema

A schema is the labeling spec for one kind of document. It names the values to capture, the type of each, and which are required. The fast path is to start a schema from a prebuilt industry template that already matches a common document type, so the first labeling form is ready in minutes rather than built field by field. A schema can also start blank or be cloned from an existing one.

In the guided setup this schema is attached to the connector from the previous step, so newly scanned files of that kind are labeled against it. The seven answer types, the validation rules, and how a schema changes over time without disturbing finished work are covered in docs/data/schemas.md.

With a connector and a schema in place, the setup runs a scan. When AI assistance is on, extraction proposes a value for each field the schema asks for, and the artifact's status moves to AI Proposed. This runs in the background, so it never blocks the screen. A small file is usually proposed within seconds. A file above the AI-extraction size limit enters the queue without a proposal and waits for a person to fill it in. The guided setup ends with an optional step to invite the rest of the team, which an administrator can skip and return to later.

Review the first artifact

Reviewing happens in the review queue, reached from the main navigation. The artifact scanned a moment ago appears there with its AI-proposed values already filled in. Opening it shows the labeling form: each field, the value AI proposed, and a confidence rating of High, Medium, or Low that points the reviewer at where to look hardest.

A proposal is a starting point, not a verdict. The reviewer keeps a value, changes it, or replaces it, and the AI's original answer is preserved either way. When every required answer is present and the validation rules pass, the reviewer accepts the label. The artifact's status moves to Tagged, and its values now count as labeled data. That acceptance, and every change behind it, is written to an audit trail that cannot be edited after the fact.

For a document with no AI proposal, because AI assistance is off or the file was too large to extract, the same form opens empty and the reviewer enters the values by hand before accepting. The flow from there is identical.

Export

A single labeled artifact is enough to produce a dataset and confirm the loop works end to end. From the exports page, an on-demand export pulls the accepted labels and writes them to a single file in the chosen format, then hands back a download link. The export runs in the background and the link holds for 48 hours.

A first export will usually be a flat spreadsheet file, one row per labeled document, openable without any tooling. The same page produces formats built for analytics and training pipelines, and supports scheduled deliveries straight to a customer-controlled bucket or server. Each record carries both the labeled values and the provenance, including which schema version produced it and whether each value came from AI or a person. The formats, destinations, scheduling, and what a record contains are covered in docs/data/exports.md.

That round trip, from a connected source to a downloadable labeled dataset, is the core of what Piprio does. Everything past it, more connectors, more schemas, teams and routing rules, scheduled exports, and the quality and audit views, builds on the same five steps.