Anthropic

Configure Anthropic (Claude models) as an LLM provider in agentgateway.

Configuration

Review the following example configuration.

# yaml-language-server: $schema=https://agentgateway.dev/schema/config

llm:
  models:
  - name: "*"
    provider: anthropic
    params:
      apiKey: "$ANTHROPIC_API_KEY"

Review the following example configuration.

Setting	Description
`name`	The model name to match in incoming requests. When a client sends `"model": "<name>"`, the request is routed to this provider. Use `*` to match any model name.
`provider`	The LLM provider, set to `anthropic` for Claude models.
`params.model`	The specific Claude model to use. If set, this model is used for all requests. If not set, the request must include the model to use.
`params.apiKey`	The Anthropic API key for authentication. You can reference environment variables using the `$VAR_NAME` syntax.

Example request

After running agentgateway with the configuration from the previous section, you can send a request to the v1/messages endpoint. Agentgateway automatically adds the x-api-key authorization and anthropic-version headers to the request. The request is forwarded to the Anthropic API and the response is returned to the client.

curl -X POST http://localhost:4000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Example response:

{
  "model": "claude-opus-4-6",
  "usage": {
    "input_tokens": 9,
    "output_tokens": 21,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "service_tier": "standard"
  },
  "content": [
    {
      "text": "Hi there! How are you doing today? Is there anything I can help you with?",
      "type": "text"
    }
  ],
  "id": "msg_01QdUEuzvXfjLh1HfMQd4UHP",
  "type": "message",
  "role": "assistant",
  "stop_reason": "end_turn",
  "stop_sequence": null
}

Token counting

Anthropic’s count_tokens API is supported for estimating token usage before making a request. Agentgateway automatically handles the required anthropic-version header and formats the request correctly for Anthropic’s API.

curl -X POST http://localhost:4000/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-6",
    "messages": [{"role": "user", "content": "Hello!"}],
    "system": "You are a helpful assistant."
  }'

Example response:

{
  "input_tokens": 15
}

Extended thinking and reasoing

Extended thinking and reasoning lets Claude reason through complex problems before generating a response. You can opt in to extended thinking and reasoning by adding specific parameters to your request.

ℹ️

Extended thinking and reasoning requires a Claude model that supports these, such as claude-opus-4-6.

To opt in to extended thinking, include the thinking.type field in your request. You can also set the output_config.effort field to control how much reasoning the model applies.

The following values are supported:

thinking field

`type` value	Additional fields	Behavior
`adaptive`	`output_config.effort`	The model decides whether to think and how much. Requires `output_config.effort` to be set.
`enabled`	`budget_tokens: <number>`	Explicitly enables thinking with a fixed token budget. Works standalone without `output_config`.
`disabled`	none	Explicitly disables thinking.

output_config field

output_config has two independent sub-fields. You can use either or both.

Sub-field	Description
`effort`	Controls the reasoning effort level. Accepted values: `low`, `medium`, `high`, `max`.
`format`	Constrains the response to a JSON schema. Set `type` to `json_schema` and provide a `schema` object. For more information, see Structured outputs.

The following example request uses adaptive extended thinking. Note that this setting requires the output_config.effort field to be set too.

curl "localhost:3000/v1/messages" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 1024,
  "thinking": {
    "type": "adaptive"
  },
  "output_config": {
    "effort": "high"
  },
  "messages": [
    {
      "role": "user",
      "content": "Explain the trade-offs between consistency and availability in distributed systems."
    }
  ]
}' | jq

Example output:

{
  "id": "msg_01HVEzWf4NJrsKyVeEUDnHNW",
  "type": "message",
  "role": "assistant",
  "model": "claude-opus-4-6",
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me think through the trade-offs between consistency and availability..."
    },
    {
      "type": "text",
      "text": "# Consistency vs. Availability in Distributed Systems\n\n..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 21,
    "output_tokens": 1024
  }
}

Structured outputs

Structured outputs constrain the model to respond with a specific JSON schema. You must provide the schema definition in your request.

Provide the JSON schema definition in the output_config.format field.

curl "localhost:3000/v1/messages" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 256,
  "output_config": {
    "format": {
      "type": "json_schema",
      "schema": {
        "type": "object",
        "properties": {
          "answer": { "type": "string" },
          "confidence": { "type": "number" }
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "Is the sky blue? Respond with your answer and a confidence score between 0 and 1."
    }
  ]
}' | jq

Example output:

{
  "id": "msg_01PsCxtLN1vftAKZgvWXhCan",
  "type": "message",
  "role": "assistant",
  "model": "claude-opus-4-6",
  "content": [
    {
      "type": "text",
      "text": "{\"answer\":\"Yes, the sky is blue during clear daytime conditions.\",\"confidence\":0.98}"
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 29,
    "output_tokens": 28
  }
}

Connect to Claude Code

Connect to Claude Code locally to verify access to the Anthropic provider through agentgateway.

Get your Anthropic API key from the Anthropic Console and save it as an environment variable.
```
export ANTHROPIC_API_KEY="sk-ant-api03-your-actual-key-here"
```

Start agentgateway with the following configuration.

cat > config.yaml << EOF
# yaml-language-server: $schema=https://agentgateway.dev/schema/config

llm:
  models:
  - name: "*"
    provider: anthropic
    params:
      apiKey: "$ANTHROPIC_API_KEY"
EOF

agentgateway -f config.yaml

In another terminal, configure Claude Code to use the agentgateway instance that is running on your localhost.
```
export ANTHROPIC_BASE_URL="http://localhost:4000"
```
Start Claude Code with the new configuration.
```
claude
```
Send a test request through Claude Code, such as Hello.

In the terminal where you run agentgateway, check the logs. You should see the requests in agentgateway logs. Claude Code continues to work normally while benefiting from any agentgateway features that you added, such as traffic management, security, and monitoring.

Example output:

2025-10-16T20:10:17.919575Z	info	request gateway=bind/3000 listener=listener0 route_rule=route0/default route=route0 endpoint=api.anthropic.com:443 src.addr=[::1]:59011 http.method=POST http.host=localhost http.path=/v1/messages?beta=true http.version=HTTP/1.1 http.status=200 gen_ai.operation.name=chat gen_ai.provider.name=anthropic gen_ai.request.model=claude-opus-4-6 gen_ai.response.model=claude-opus-4-6 gen_ai.usage.input_tokens=4734 gen_ai.usage.output_tokens=32 gen_ai.request.temperature=0 gen_ai.request.max_tokens=512 duration=1900ms

Multiple LLM providers Azure

Anthropic

Configuration

Example request

Token counting

Extended thinking and reasoing

Structured outputs

Connect to Claude Code

What could be improved?