Private Inference

Run select models inside a hardware-secured enclave (TEE) — append :private to any private-capable model.

Private inference routes your request to a confidential-compute provider (NEAR AI Cloud, backed by Intel TDX + Phala dstack) where the prompt and completion are processed inside a Trusted Execution Environment. The gateway verifies the enclave's attestation on every request and fail-closes — a :private request is never silently downgraded to a non-confidential provider.

What this protects

The TEE protects your prompt against the inference provider's own operators and infrastructure. The Bankr gateway still sits in the request path (it terminates your TLS and forwards to the enclave), so this is not end-to-end encryption against Bankr. See Privacy levels below.

Enabling private inference

Opt in per request — no account setting. Use either form:

Append :private to the model ID, e.g. glm-5.2:private
Send "private": true in the request body

Only a trailing :private token (lowercase) is treated as the opt-in.

curl -X POST https://llm.bankr.bot/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: bk_YOUR_API_KEY" \
  -d '{
    "model": "glm-5.2:private",
    "messages": [{"role": "user", "content": "Summarize this contract..."}]
  }'

Body-field form (works on /v1/chat/completions and /v1/messages):

curl -X POST https://llm.bankr.bot/v1/messages \
  -H "Content-Type: application/json" \
  -H "X-API-Key: bk_YOUR_API_KEY" \
  -d '{
    "model": "glm-5.2",
    "private": true,
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Supported models

Private inference is available on a set of open-weight models (frontier proprietary models — Claude, Gemini, GPT — are not served confidentially). The live set currently covers models from the DeepSeek, GLM (Z.ai), Kimi (Moonshot), MiniMax, and Gemma families.

The source of truth is the /v1/models endpoint: a private-capable model carries "private": true and, once attested, "confidential": true with "attested": "gateway". Check the flag rather than hard-coding the list — coverage expands as more confidential slots come online.

bankr llm models    # or GET /v1/models

Response headers

On a verified :private request the gateway stamps the attested enclave identity onto the response so you can cross-check it:

Header	Description
`X-Confidential-Verified`	`true` when the enclave attestation was verified this request
`X-Confidential-Signer`	The ed25519 signing address bound in the enclave's attestation report
`X-Confidential-Tcb`	Trusted Computing Base status reported by the enclave (e.g. `OutOfDate`)

The X-Confidential-Signer value matches the signing key in the attestation report (below), so a client can confirm it's talking to the same enclave the gateway verified.

Verifying attestation yourself

GET /v1/attestation/report

Returns the provider's raw attestation report (Intel TDX quote + ed25519 signing key) so you can independently verify the enclave with a tool like @phala/dcap-qvl or Phala's explorer — you don't have to trust the gateway's word. The report is identical for all callers and briefly cached.

curl https://llm.bankr.bot/v1/attestation/report \
  -H "X-API-Key: bk_YOUR_API_KEY"

Error responses

A :private request fails loudly rather than downgrading:

Status	Code	Meaning
`422`	`confidential_unavailable`	The requested model has no private (TEE) compute slot. Drop `:private` or pick a private-capable model.
`503`	`attestation_unverified`	The enclave's attestation could not be verified right now (fail-closed). Retry shortly.

{
  "error": {
    "message": "Model 'gpt-5.5' does not offer a private (:private) compute environment",
    "type": "invalid_request_error",
    "code": "confidential_unavailable"
  }
}

Privacy levels

Private inference today gives you a gateway-verified, client-verifiable enclave:

Level	Guarantee	Status
Route to a TEE provider	Inference runs in confidential hardware	✅
Gateway verifies attestation (fail-closed)	The gateway checks the Intel TDX quote on every request and rejects on failure	✅
Client re-verifies	`/v1/attestation/report` lets you confirm the enclave yourself	✅
End-to-end (not even Bankr)	Client talks directly to the enclave (or via OHTTP) so the gateway never sees plaintext	Client-dependent

The first three levels work with any HTTP client, because the gateway does the verification for you. End-to-end ("not even Bankr sees the plaintext") is fundamentally different: it can only exist in a client that implements direct/OHTTP transport to the enclave itself — a plain request through llm.bankr.bot always terminates TLS at the gateway, so the gateway necessarily sees the plaintext. Delivering this would require a first-party client built for it (for example, a future Bankr mobile or desktop app) rather than a server-side change. It is not available today.

Pricing

Private requests bill from your LLM credit balance at the model's standard per-token rate — there is no separate confidential price today. See Max Mode pricing and credit management.

Next steps

API Reference — full endpoint documentation
Supported Models — complete model catalog
Overview — routing, payments, and data privacy

Enabling private inference​

Supported models​

Response headers​

Verifying attestation yourself​

Error responses​

Privacy levels​

Pricing​

Next steps​