Skip to main content

Private Inference

Run select models inside a hardware-secured enclave (TEE) — append :private to any private-capable model.

Private inference routes your request to a confidential-compute provider (NEAR AI Cloud, backed by Intel TDX + Phala dstack) where the prompt and completion are processed inside a Trusted Execution Environment. The gateway verifies the enclave's attestation on every request and fail-closes — a :private request is never silently downgraded to a non-confidential provider.

What this protects

The TEE protects your prompt against the inference provider's own operators and infrastructure. The Bankr gateway still sits in the request path (it terminates your TLS and forwards to the enclave), so this is not end-to-end encryption against Bankr. See Privacy levels below.

Enabling private inference

Opt in per request — no account setting. Use either form:

  • Append :private to the model ID, e.g. glm-5.2:private
  • Send "private": true in the request body

Only a trailing :private token (lowercase) is treated as the opt-in.

curl -X POST https://llm.bankr.bot/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "glm-5.2:private",
"messages": [{"role": "user", "content": "Summarize this contract..."}]
}'

Body-field form (works on /v1/chat/completions and /v1/messages):

curl -X POST https://llm.bankr.bot/v1/messages \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "glm-5.2",
"private": true,
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'

Supported models

Private inference is available on a set of open-weight models (frontier proprietary models — Claude, Gemini, GPT — are not served confidentially). The live set currently covers models from the DeepSeek, GLM (Z.ai), Kimi (Moonshot), MiniMax, and Gemma families.

The source of truth is the /v1/models endpoint: a private-capable model carries "private": true and, once attested, "confidential": true with "attested": "gateway". Check the flag rather than hard-coding the list — coverage expands as more confidential slots come online.

bankr llm models    # or GET /v1/models

Response headers

On a verified :private request the gateway stamps the attested enclave identity onto the response so you can cross-check it:

HeaderDescription
X-Confidential-Verifiedtrue when the enclave attestation was verified this request
X-Confidential-SignerThe ed25519 signing address bound in the enclave's attestation report
X-Confidential-TcbTrusted Computing Base status reported by the enclave (e.g. OutOfDate)

The X-Confidential-Signer value matches the signing key in the attestation report (below), so a client can confirm it's talking to the same enclave the gateway verified.

Verifying attestation yourself

GET /v1/attestation/report

Returns the provider's raw attestation report (Intel TDX quote + ed25519 signing key) so you can independently verify the enclave with a tool like @phala/dcap-qvl or Phala's explorer — you don't have to trust the gateway's word. The report is identical for all callers and briefly cached.

curl https://llm.bankr.bot/v1/attestation/report \
-H "X-API-Key: bk_YOUR_API_KEY"

Error responses

A :private request fails loudly rather than downgrading:

StatusCodeMeaning
422confidential_unavailableThe requested model has no private (TEE) compute slot. Drop :private or pick a private-capable model.
503attestation_unverifiedThe enclave's attestation could not be verified right now (fail-closed). Retry shortly.
{
"error": {
"message": "Model 'gpt-5.5' does not offer a private (:private) compute environment",
"type": "invalid_request_error",
"code": "confidential_unavailable"
}
}

Privacy levels

Private inference today gives you a gateway-verified, client-verifiable enclave:

LevelGuaranteeStatus
Route to a TEE providerInference runs in confidential hardware
Gateway verifies attestation (fail-closed)The gateway checks the Intel TDX quote on every request and rejects on failure
Client re-verifies/v1/attestation/report lets you confirm the enclave yourself
End-to-end (not even Bankr)Client talks directly to the enclave (or via OHTTP) so the gateway never sees plaintextClient-dependent

The first three levels work with any HTTP client, because the gateway does the verification for you. End-to-end ("not even Bankr sees the plaintext") is fundamentally different: it can only exist in a client that implements direct/OHTTP transport to the enclave itself — a plain request through llm.bankr.bot always terminates TLS at the gateway, so the gateway necessarily sees the plaintext. Delivering this would require a first-party client built for it (for example, a future Bankr mobile or desktop app) rather than a server-side change. It is not available today.

Pricing

Private requests bill from your LLM credit balance at the model's standard per-token rate — there is no separate confidential price today. See Max Mode pricing and credit management.

Next steps