Skip to main content

API Reference

The LLM Gateway provides two API formats: OpenAI-compatible and Anthropic-compatible.

Base URL

https://llm.bankr.bot

Authentication

All requests require a Bankr API key (bk_...) in the X-API-Key header or Authorization: Bearer token:

X-API-Key: bk_YOUR_API_KEY

or

Authorization: Bearer bk_YOUR_API_KEY

Generate API keys at bankr.bot/api-keys.


OpenAI-Compatible API

Chat Completions

POST /v1/chat/completions

Create a chat completion using OpenAI format.

Request

curl -X POST https://llm.bankr.bot/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "claude-opus-4.8",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1024
}'

Response

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1706123456,
"model": "claude-opus-4.8",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 10,
"total_tokens": 30
}
}

List Models

GET /v1/models

List available models.

Response

{
"object": "list",
"data": [
{"id": "claude-opus-4.8", "object": "model", "owned_by": "anthropic"},
{"id": "claude-opus-4.7", "object": "model", "owned_by": "anthropic"},
{"id": "claude-opus-4.6", "object": "model", "owned_by": "anthropic"},
{"id": "claude-sonnet-4.6", "object": "model", "owned_by": "anthropic"},
{"id": "claude-haiku-4.5", "object": "model", "owned_by": "anthropic"},
{"id": "gemini-3.1-pro", "object": "model", "owned_by": "google"},
{"id": "gemini-3-flash", "object": "model", "owned_by": "google"},
{"id": "gemma-4-31b-it", "object": "model", "owned_by": "google"},
{"id": "gpt-5.4", "object": "model", "owned_by": "openai"},
{"id": "gpt-5.2", "object": "model", "owned_by": "openai"},
{"id": "gpt-5.2-codex", "object": "model", "owned_by": "openai"},
{"id": "grok-4.20", "object": "model", "owned_by": "x-ai"},
{"id": "glm-5.1", "object": "model", "owned_by": "z-ai"},
{"id": "deepseek-v3.2", "object": "model", "owned_by": "deepseek"},
{"id": "minimax-m2.7", "object": "model", "owned_by": "minimax"},
{"id": "kimi-k2.6", "object": "model", "owned_by": "moonshotai"},
{"id": "kimi-k2.5", "object": "model", "owned_by": "moonshotai"},
{"id": "qwen3.5-plus", "object": "model", "owned_by": "qwen"}
]
}

The list above is abbreviated — call /v1/models or bankr llm models for the full live catalog. See Supported Models for the complete list.

A model that can serve private (TEE) inference carries "private": true; once its enclave is attested it also reports "confidential": true and "attested": "gateway".


Anthropic-Compatible API

Messages

POST /v1/messages

Create a message using Anthropic format. Ideal for Claude Code and Anthropic SDK users.

Request

curl -X POST https://llm.bankr.bot/v1/messages \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "claude-opus-4.8",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello!"}
]
}'

Response

{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you today?"
}
],
"model": "claude-opus-4.8",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 10,
"output_tokens": 12
}
}

Private (Confidential) Inference

Available on both /v1/chat/completions (OpenAI format) and /v1/messages (Anthropic format).

Append :private to a private-capable model ID — or send "private": true in the request body — to route into a hardware-secured enclave. The gateway verifies the enclave's attestation and fail-closes (422 confidential_unavailable if the model has no TEE slot, 503 attestation_unverified if attestation can't be verified). Verified responses carry X-Confidential-Verified, X-Confidential-Signer, and X-Confidential-Tcb headers.

curl -X POST https://llm.bankr.bot/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "glm-5.2:private",
"messages": [{"role": "user", "content": "Hello!"}]
}'

Attestation Report

GET /v1/attestation/report

Returns the provider's raw attestation report (Intel TDX quote + ed25519 signing key) so a client can independently verify the enclave. See Private Inference for the full flow.


Health Check

GET /health

Check gateway and provider health. No authentication required.

Response

{
"status": "ok",
"providers": {
"vertexGemini": true,
"vertexClaude": true,
"openrouter": true
}
}

Status codes:

  • 200 — At least one provider healthy
  • 503 — All providers unavailable

Error Responses

401 Unauthorized

{
"error": {
"message": "Unauthorized",
"type": "auth_error"
}
}

429 Rate Limited

{
"error": {
"message": "Too many requests, please try again later.",
"type": "rate_limit_error"
}
}

500 Server Error

{
"error": {
"message": "Internal server error",
"type": "server_error"
}
}

Usage

Get Usage Summary

GET /v1/usage?days=30

Returns aggregated token usage and cost breakdown for the authenticated API key. Requires authentication.

Query Parameters

ParameterTypeDefaultDescription
daysnumber30Number of days to aggregate (1–90)

Response

{
"object": "usage_summary",
"days": 30,
"startDate": "2026-01-28T00:00:00.000Z",
"endDate": "2026-02-27T00:00:00.000Z",
"totals": {
"totalRequests": 1981,
"totalInputTokens": 489789,
"totalOutputTokens": 631794,
"totalCacheReadInputTokens": 53460194,
"totalCacheWriteInputTokens": 12555591,
"totalTokens": 67137368,
"totalCost": 248.38,
"totalCacheCost": 208.01
},
"byModel": [
{
"model": "claude-opus-4.8",
"provider": "vertex-claude",
"requests": 1574,
"inputTokens": 6250,
"outputTokens": 500097,
"cacheReadInputTokens": 27309491,
"cacheWriteInputTokens": 7474005,
"totalTokens": 35289843,
"totalCost": 218.70,
"cacheCost": 181.10
}
]
}

Credits

Get Credit Balance

GET /v1/credits

Returns the current LLM credit balance for the API key's wallet. Requires authentication.

Use this to check available capacity before relying on the gateway — effectiveBalanceUsd is the truest "available balance" because it nets out in-flight usage that hasn't been deducted yet. Balances read directly from the database (not a cached value) for accuracy.

Request

curl https://llm.bankr.bot/v1/credits \
-H "X-API-Key: bk_YOUR_API_KEY"

Response

{
"object": "credit_balance",
"balanceUsd": 12.34,
"effectiveBalanceUsd": 11.20,
"undeductedCostUsd": 1.14
}

Fields

FieldTypeDescription
balanceUsdnumberTotal spendable credit on the wallet, in USD.
effectiveBalanceUsdnumberAvailable balance after subtracting in-flight usage not yet deducted. Floored at 0. Use this for capacity decisions.
undeductedCostUsdnumberCost of in-flight/served requests not yet deducted from balanceUsd (the amount subtracted to derive effectiveBalanceUsd).

Requests are rejected with 402 Payment Required once the effective balance is exhausted. There are no per-key spending caps — the balance shown is the full credit available to the key.


Streaming

Both endpoints support streaming responses:

curl -X POST https://llm.bankr.bot/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: bk_YOUR_API_KEY" \
-d '{
"model": "claude-opus-4.8",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'

Streaming uses Server-Sent Events (SSE) format.