Private Inference
Run select models inside a hardware-secured enclave (TEE) — append :private to any private-capable model.
Private inference routes your request to a confidential-compute provider (NEAR AI Cloud, backed by Intel TDX + Phala dstack) where the prompt and completion are processed inside a Trusted Execution Environment. The gateway verifies the enclave's attestation on every request and fail-closes — a :private request is never silently downgraded to a non-confidential provider.
The TEE protects your prompt against the inference provider's own operators and infrastructure. The Bankr gateway still sits in the request path (it terminates your TLS and forwards to the enclave), so this is not end-to-end encryption against Bankr. See Privacy levels below.
Enabling private inference
Opt in per request — no account setting. Use either form:
- Append
:privateto the model ID, e.g.glm-5.2:private - Send
"private": truein the request body
Only a trailing :private token (lowercase) is treated as the opt-in.
curl -X POST https://llm.bankr.bot/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "glm-5.2:private",
"messages": [{"role": "user", "content": "Summarize this contract..."}]
}'
Body-field form (works on /v1/chat/completions and /v1/messages):
curl -X POST https://llm.bankr.bot/v1/messages \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "glm-5.2",
"private": true,
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'
Supported models
Private inference is available on a set of open-weight models (frontier proprietary models — Claude, Gemini, GPT — are not served confidentially). The live set currently covers models from the DeepSeek, GLM (Z.ai), Kimi (Moonshot), MiniMax, and Gemma families.
The source of truth is the /v1/models endpoint: a private-capable model carries "private": true and, once attested, "confidential": true with "attested": "gateway". Check the flag rather than hard-coding the list — coverage expands as more confidential slots come online.
bankr llm models # or GET /v1/models
Response headers
On a verified :private request the gateway stamps the attested enclave identity onto the response so you can cross-check it:
| Header | Description |
|---|---|
X-Confidential-Verified | true when the enclave attestation was verified this request |
X-Confidential-Signer | The ed25519 signing address bound in the enclave's attestation report |
X-Confidential-Tcb | Trusted Computing Base status reported by the enclave (e.g. OutOfDate) |
The X-Confidential-Signer value matches the signing key in the attestation report (below), so a client can confirm it's talking to the same enclave the gateway verified.
Verifying attestation yourself
GET /v1/attestation/report
Returns the provider's raw attestation report (Intel TDX quote + ed25519 signing key) so you can independently verify the enclave with a tool like @phala/dcap-qvl or Phala's explorer — you don't have to trust the gateway's word. The report is identical for all callers and briefly cached.
curl https://llm.bankr.bot/v1/attestation/report \
-H "X-API-Key: bk_YOUR_API_KEY"
Error responses
A :private request fails loudly rather than downgrading:
| Status | Code | Meaning |
|---|---|---|
422 | confidential_unavailable | The requested model has no private (TEE) compute slot. Drop :private or pick a private-capable model. |
503 | attestation_unverified | The enclave's attestation could not be verified right now (fail-closed). Retry shortly. |
{
"error": {
"message": "Model 'gpt-5.5' does not offer a private (:private) compute environment",
"type": "invalid_request_error",
"code": "confidential_unavailable"
}
}
Privacy levels
Private inference today gives you a gateway-verified, client-verifiable enclave:
| Level | Guarantee | Status |
|---|---|---|
| Route to a TEE provider | Inference runs in confidential hardware | ✅ |
| Gateway verifies attestation (fail-closed) | The gateway checks the Intel TDX quote on every request and rejects on failure | ✅ |
| Client re-verifies | /v1/attestation/report lets you confirm the enclave yourself | ✅ |
| End-to-end (not even Bankr) | Client talks directly to the enclave (or via OHTTP) so the gateway never sees plaintext | Client-dependent |
The first three levels work with any HTTP client, because the gateway does the verification for you. End-to-end ("not even Bankr sees the plaintext") is fundamentally different: it can only exist in a client that implements direct/OHTTP transport to the enclave itself — a plain request through llm.bankr.bot always terminates TLS at the gateway, so the gateway necessarily sees the plaintext. Delivering this would require a first-party client built for it (for example, a future Bankr mobile or desktop app) rather than a server-side change. It is not available today.
Pricing
Private requests bill from your LLM credit balance at the model's standard per-token rate — there is no separate confidential price today. See Max Mode pricing and credit management.
Next steps
- API Reference — full endpoint documentation
- Supported Models — complete model catalog
- Overview — routing, payments, and data privacy