Skip to content

BYO LLM

Piprio runs AI extraction against a language model. By default that model is the one Piprio operates. On the Enterprise plan, an organization can point extraction at a model it controls instead, including a model that runs entirely inside the customer's own network. The data Piprio sends to that model, and the cost of running it, then sit with the customer.

This matters for two reasons buyers usually raise early. The first is data residency: a regulated manufacturer often cannot send drawing text or part metadata to a third-party model, and an on-prem endpoint keeps that content inside the corporate boundary. The second is cost and model choice: an organization may already run a tuned model on its own hardware, and pointing Piprio at it avoids paying twice.

Bringing an organization's own model is configured once per organization by an administrator or owner. It is an Enterprise capability, so the option does not appear on other plans. There is one model per organization, not one per document type or per schema. Every team and every workflow in that organization extracts through the same model the administrator selects.

Supported providers

An organization can choose from three options.

  • Anthropic. Extraction runs against Anthropic's hosted models using the organization's Anthropic key.
  • OpenAI. Extraction runs against OpenAI's hosted models using the organization's OpenAI key.
  • An OpenAI-compatible endpoint. Extraction runs against any endpoint that speaks the OpenAI chat-completions protocol, at a network address the administrator supplies.

The third option is the one most on-prem deployments use, and it covers the widest range of setups. The first two are for teams that simply want their own hosted account and billing relationship rather than Piprio's.

On-prem endpoints

A self-hosted or air-gapped model is reached through the OpenAI-compatible option. The administrator gives Piprio the address of the endpoint, the model name to call, and a key. Anything that exposes the OpenAI chat-completions protocol works at that address, including the common ways teams serve models on their own hardware: vLLM, Ollama, LiteLLM, LM Studio, and self-hosted OpenAI proxies.

The endpoint address has to be a real URL using standard web transport. Piprio checks that before saving, so a typo or a bare hostname is caught at configuration time rather than at the first extraction. For the two hosted options there is no address to enter, and Piprio reaches the public service directly.

One requirement applies to whatever the organization puts behind that address. The model has to return structured output in JSON form, because that is how Piprio reads extracted field values back. Most current instruction-tuned models support this. A model that does not will be reported when an administrator tests the connection, before it ever runs against real documents.

API keys and secrets

The key an administrator enters is encrypted before it is stored, using AES-256-GCM. It is never written to Piprio's database in plain form, never returned by any screen or report that displays the configuration, and never written to the audit trail or any log. When an administrator views the saved configuration later, Piprio shows that a key is on file without ever revealing it.

The plain key exists only at one moment: the instant a document is being extracted, in the memory of the process making the call to the organization's model, and only for the duration of that call. Nothing about that process keeps the key afterward.

A few details make the configuration easier to live with. The first time an administrator saves a model, a key is required. After that, the model name or the endpoint address can change without re-typing the secret. Leaving the key blank keeps the stored one. Supplying a new key replaces the old one. Every save and every removal of the configuration is recorded in the audit trail, with the provider, the model name, and (for an on-prem endpoint) the address. The key itself, encrypted or not, never appears in those records.

Compatibility and testing

For an on-prem or self-hosted model, the only contract Piprio depends on is the OpenAI chat-completions protocol with structured JSON output. If the endpoint honors that, Piprio can drive it. The endpoint does not have to match any Piprio-specific format.

Before committing a model to live extraction, an administrator can test it. The test sends one small, extraction-shaped request to the provider and reports whether the round trip worked. A pass means the model answered and returned a value Piprio could read. A failure comes back with a plain reason rather than a raw error, so an administrator can tell apart the cases that matter: the key was rejected, the model name was not found, the model does not return structured JSON, the endpoint could not be reached, or the call took too long. The test names which one happened, and it reports how long the call took, so a fast rejection looks different from a slow timeout.

Saving and testing are independent. Piprio does not force a passing test before saving, so an administrator can stage a configuration ahead of an endpoint coming online. A model that does not actually work will surface that the next time extraction runs.

Asynchronous by design

Extraction never runs inside the screens a team uses. It runs in the background, on its own. Opening a document, reviewing a label, or browsing the queue does not wait on a model call, and a slow or stuck model never freezes the interface. This holds whether the model is Piprio's own or one the organization brought, and it is why a slow on-prem endpoint degrades into slower extractions rather than a slower product.

Fallback behavior

There is no automatic switching between models. Once an organization has its own model configured, every extraction uses it. If that model becomes unreachable, returns a bad key, or rejects a request, the affected document's extraction is marked as failed with the reason attached, rather than quietly falling back to Piprio's model. That choice is deliberate: a regulated customer that requires its own model would not want document content silently routed elsewhere because of a transient outage.

To return to Piprio's own model, an administrator removes the configuration. With no model on file, an organization extracts through the platform default exactly as it did before any model was set.