Virtual key management

Verified Code examples on this page have been automatically tested and verified.

Issue API keys to users or applications and control token usage (also known as virtual keys).

About

Virtual key management allows you to issue API keys to users or applications, each with independent tracking and cost controls. Agentgateway achieves this by composing existing capabilities:

  • API key authentication: Identify incoming requests by API key
  • Token-based rate limiting: Enforce token budgets
  • Observability metrics: Track per-key spending and usage

How virtual keys work

  flowchart TD
  A[Request arrives with API key] --> B[Validate API key]
  B --> C{Key valid?}
  C -->|Yes| D[Check token budget]
  D --> E{Budget available?}
  E -->|Yes| F[Forward to LLM]
  F --> G[Track token usage]
  G --> H[Deduct from budget]
  E -->|No| I[Reject with 429]
  C -->|No| J[Reject with 401]
  subgraph refill["Budget refills periodically"]
    H
  end

Before you begin

Install the agentgateway binary.

Set up virtual keys

Step 1: Configure API key authentication

Create a configuration with API key authentication. This example creates two virtual keys for Alice and Bob.

cat <<'EOF' > config.yaml
# yaml-language-server: $schema=https://agentgateway.dev/schema/config

llm:
  policies:
    apiKey:
      mode: strict
      keys:
      - key: sk-alice-abc123def456
        metadata:
          user: alice
      - key: sk-bob-xyz789uvw012
        metadata:
          user: bob
  models:
  - name: "*"
    provider: openAI
    params:
      apiKey: "$OPENAI_API_KEY"
EOF
SettingDescription
apiKey.modeSet to strict to require a valid API key for all requests. Use optional to allow unauthenticated requests.
apiKey.keysList of API keys. Each key has a key value and optional metadata.
keyThe API key value that users include in the Authorization: Bearer <key> header.
metadataOptional metadata associated with the key, such as a user identifier or tier.

Step 2: Start agentgateway

agentgateway -f config.yaml

Step 3: Test the virtual keys

  1. Send a request with Alice’s API key. Verify that the request succeeds.

    curl -s http://localhost:4000/v1/chat/completions \
      -H "Authorization: Bearer sk-alice-abc123def456" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gpt-3.5-turbo",
        "messages": [{"role": "user", "content": "Hello!"}]
      }' | jq .

    Example successful response:

    {
      "choices": [{
        "message": {
          "role": "assistant",
          "content": "Hello! How can I help you today?"
        }
      }],
      "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 9,
        "total_tokens": 19
      }
    }
  2. Send a request without a valid API key. Verify that the request is rejected with a 401 status.

    curl -s -o /dev/null -w "%{http_code}" http://localhost:4000/v1/chat/completions \
      -H "Authorization: Bearer invalid-key" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gpt-3.5-turbo",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'

    Expected response:

    HTTP/1.1 401 Unauthorized

Add a global token budget

To add a token budget that limits total token usage across all keys, use the routing-based configuration format with localRateLimit. Local rate limits apply to the gateway as a whole, not per key.

ℹ️
Rate limiting requires the binds/listeners/routes configuration format because localRateLimit is an HTTP-level policy. For more information, see the Routing-based configuration guide.
cat <<'EOF' > config.yaml
# yaml-language-server: $schema=https://agentgateway.dev/schema/config

binds:
- port: 4000
  listeners:
  - routes:
    - backends:
      - ai:
          name: openai
          provider:
            openAI:
              model: gpt-3.5-turbo
      policies:
        apiKey:
          mode: strict
          keys:
          - key: sk-alice-abc123def456
            metadata:
              user: alice
          - key: sk-bob-xyz789uvw012
            metadata:
              user: bob
        backendAuth:
          key: "$OPENAI_API_KEY"
        localRateLimit:
        - maxTokens: 100000
          tokensPerFill: 100000
          fillInterval: 86400s
          type: tokens
EOF
SettingDescription
localRateLimitToken-based rate limiting applied to all requests through this route.
maxTokensThe maximum number of tokens available in the budget.
tokensPerFillThe number of tokens added during each refill.
fillIntervalThe interval between refills. Use 86400s for a daily budget.
typeSet to tokens for token-based limits. Use requests for request-based limits.

For more details on rate limiting, see Control spend.

Monitor per-key spending

Track token usage and spending for each virtual key using Prometheus metrics exposed by agentgateway.

  1. Access the agentgateway metrics endpoint.

    curl http://localhost:15000/metrics
  2. Query token usage metrics.

    # Total tokens consumed over the last 24 hours
    sum(
      increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) +
      increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h])
    )
  3. Calculate costs by multiplying token counts by your provider’s pricing. For example, with OpenAI GPT-3.5:

    # Estimated cost (assuming $0.50 per 1M input tokens, $1.50 per 1M output tokens)
    sum(
      ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) / 1000000) * 0.50) +
      ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h]) / 1000000) * 1.50)
    )

For more information on cost tracking, see the cost tracking guide.

What’s next

Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.