API Reference
The LLM Gateway provides two API formats: OpenAI-compatible and Anthropic-compatible.
Base URL
https://llm.bankr.bot
Authentication
All requests require a Bankr API key (bk_...) in the X-API-Key header or Authorization: Bearer token:
X-API-Key: bk_YOUR_API_KEY
or
Authorization: Bearer bk_YOUR_API_KEY
Generate API keys at bankr.bot/api-keys.
OpenAI-Compatible API
Chat Completions
POST /v1/chat/completions
Create a chat completion using OpenAI format.
Request
curl -X POST https://llm.bankr.bot/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "claude-opus-4.8",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1024
}'
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1706123456,
"model": "claude-opus-4.8",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 10,
"total_tokens": 30
}
}
List Models
GET /v1/models
List available models.
Response
{
"object": "list",
"data": [
{"id": "claude-opus-4.8", "object": "model", "owned_by": "anthropic"},
{"id": "claude-opus-4.7", "object": "model", "owned_by": "anthropic"},
{"id": "claude-opus-4.6", "object": "model", "owned_by": "anthropic"},
{"id": "claude-sonnet-4.6", "object": "model", "owned_by": "anthropic"},
{"id": "claude-haiku-4.5", "object": "model", "owned_by": "anthropic"},
{"id": "gemini-3.1-pro", "object": "model", "owned_by": "google"},
{"id": "gemini-3-flash", "object": "model", "owned_by": "google"},
{"id": "gemma-4-31b-it", "object": "model", "owned_by": "google"},
{"id": "gpt-5.4", "object": "model", "owned_by": "openai"},
{"id": "gpt-5.2", "object": "model", "owned_by": "openai"},
{"id": "gpt-5.2-codex", "object": "model", "owned_by": "openai"},
{"id": "grok-4.20", "object": "model", "owned_by": "x-ai"},
{"id": "glm-5.1", "object": "model", "owned_by": "z-ai"},
{"id": "deepseek-v3.2", "object": "model", "owned_by": "deepseek"},
{"id": "minimax-m2.7", "object": "model", "owned_by": "minimax"},
{"id": "kimi-k2.6", "object": "model", "owned_by": "moonshotai"},
{"id": "kimi-k2.5", "object": "model", "owned_by": "moonshotai"},
{"id": "qwen3.5-plus", "object": "model", "owned_by": "qwen"}
]
}
The list above is abbreviated — call /v1/models or bankr llm models for the full live catalog. See Supported Models for the complete list.
A model that can serve private (TEE) inference carries "private": true; once its enclave is attested it also reports "confidential": true and "attested": "gateway".
Anthropic-Compatible API
Messages
POST /v1/messages
Create a message using Anthropic format. Ideal for Claude Code and Anthropic SDK users.
Request
curl -X POST https://llm.bankr.bot/v1/messages \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "claude-opus-4.8",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Response
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you today?"
}
],
"model": "claude-opus-4.8",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 10,
"output_tokens": 12
}
}
Private (Confidential) Inference
Available on both /v1/chat/completions (OpenAI format) and /v1/messages (Anthropic format).
Append :private to a private-capable model ID — or send "private": true in the request body — to route into a hardware-secured enclave. The gateway verifies the enclave's attestation and fail-closes (422 confidential_unavailable if the model has no TEE slot, 503 attestation_unverified if attestation can't be verified). Verified responses carry X-Confidential-Verified, X-Confidential-Signer, and X-Confidential-Tcb headers.
curl -X POST https://llm.bankr.bot/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "glm-5.2:private",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Attestation Report
GET /v1/attestation/report
Returns the provider's raw attestation report (Intel TDX quote + ed25519 signing key) so a client can independently verify the enclave. See Private Inference for the full flow.
Health Check
GET /health
Check gateway and provider health. No authentication required.
Response
{
"status": "ok",
"providers": {
"vertexGemini": true,
"vertexClaude": true,
"openrouter": true
}
}
Status codes:
200— At least one provider healthy503— All providers unavailable
Error Responses
401 Unauthorized
{
"error": {
"message": "Unauthorized",
"type": "auth_error"
}
}
429 Rate Limited
{
"error": {
"message": "Too many requests, please try again later.",
"type": "rate_limit_error"
}
}
500 Server Error
{
"error": {
"message": "Internal server error",
"type": "server_error"
}
}
Usage
Get Usage Summary
GET /v1/usage?days=30
Returns aggregated token usage and cost breakdown for the authenticated API key. Requires authentication.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
days | number | 30 | Number of days to aggregate (1–90) |
Response
{
"object": "usage_summary",
"days": 30,
"startDate": "2026-01-28T00:00:00.000Z",
"endDate": "2026-02-27T00:00:00.000Z",
"totals": {
"totalRequests": 1981,
"totalInputTokens": 489789,
"totalOutputTokens": 631794,
"totalCacheReadInputTokens": 53460194,
"totalCacheWriteInputTokens": 12555591,
"totalTokens": 67137368,
"totalCost": 248.38,
"totalCacheCost": 208.01
},
"byModel": [
{
"model": "claude-opus-4.8",
"provider": "vertex-claude",
"requests": 1574,
"inputTokens": 6250,
"outputTokens": 500097,
"cacheReadInputTokens": 27309491,
"cacheWriteInputTokens": 7474005,
"totalTokens": 35289843,
"totalCost": 218.70,
"cacheCost": 181.10
}
]
}
Credits
Get Credit Balance
GET /v1/credits
Returns the current LLM credit balance for the API key's wallet. Requires authentication.
Use this to check available capacity before relying on the gateway — effectiveBalanceUsd is the truest "available balance" because it nets out in-flight usage that hasn't been deducted yet. Balances read directly from the database (not a cached value) for accuracy.
Request
curl https://llm.bankr.bot/v1/credits \
-H "X-API-Key: bk_YOUR_API_KEY"
Response
{
"object": "credit_balance",
"balanceUsd": 12.34,
"effectiveBalanceUsd": 11.20,
"undeductedCostUsd": 1.14
}
Fields
| Field | Type | Description |
|---|---|---|
balanceUsd | number | Total spendable credit on the wallet, in USD. |
effectiveBalanceUsd | number | Available balance after subtracting in-flight usage not yet deducted. Floored at 0. Use this for capacity decisions. |
undeductedCostUsd | number | Cost of in-flight/served requests not yet deducted from balanceUsd (the amount subtracted to derive effectiveBalanceUsd). |
Requests are rejected with 402 Payment Required once the effective balance is exhausted. There are no per-key spending caps — the balance shown is the full credit available to the key.
Streaming
Both endpoints support streaming responses:
curl -X POST https://llm.bankr.bot/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "claude-opus-4.8",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
Streaming uses Server-Sent Events (SSE) format.