Amazon Bedrock

Configure Amazon Bedrock as an LLM provider in agentgateway.

Authentication

Before you can use Bedrock as an LLM provider, you must authenticate by using the standard AWS authentication sources.

Configuration

Review the following example configuration.
# yaml-language-server: $schema=https://agentgateway.dev/schema/config

llm:
  models:
  - name: "*"
    provider: bedrock
    params:
      region: us-west-2
Review the following example configuration.
SettingDescription
nameThe model name to match in incoming requests. When a client sends "model": "<name>", the request is routed to this provider. Use * to match any model name.
providerThe LLM provider, set to bedrock for Amazon Bedrock models.
params.modelThe specific Bedrock model to use. If set, this model is used for all requests. If not set, the request must include the model to use.
params.regionThe AWS region where the Bedrock model is hosted.

Token counting

Bedrock supports token counting for Anthropic models via the count_tokens endpoint. Agentgateway automatically handles the required formatting for Bedrock’s count-tokens endpoint, including adding the max_tokens: 1 parameter and Base64 encoding the request body.

curl -X POST http://localhost:4000/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-3-5-sonnet-20241022-v2:0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "system": "You are a helpful assistant."
  }'

Example response:

{
  "input_tokens": 15
}

Extended thinking and reasoning

Extended thinking and reasoning lets models reason through complex problems before generating a response. You can opt in to extended thinking and reasoning by adding specific parameters to your request. Agentgateway maps these parameters to Bedrock’s native format automatically.

ℹ️
Extended thinking and reasoning requires a Claude model that supports it, such as us.anthropic.claude-opus-4-20250514-v1:0.

Use the reasoning.effort field to control how much reasoning the model applies. The value is automatically mapped to a thinking budget.

reasoning.effort valueThinking budget
minimal or low1,024 tokens
medium2,048 tokens
high or xhigh4,096 tokens
curl "localhost:3000/model/us.anthropic.claude-opus-4-20250514-v1:0/converse" \
  -H content-type:application/json -d '{
  "model": "",
  "max_output_tokens": 5000,
  "input": "Explain the trade-offs between consistency and availability in distributed systems.",
  "reasoning": {
    "effort": "high"
  }
}' | jq

Use the reasoning_effort field to control how much reasoning the model applies. The value is automatically mapped to a thinking budget.

reasoning_effort valueThinking budget
minimal or low1,024 tokens
medium2,048 tokens
high or xhigh4,096 tokens

Note that max_tokens must be greater than the thinking budget, and the minimum thinking budget is 1,024 tokens.

curl "localhost:3000/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 6000,
  "reasoning_effort": "high",
  "messages": [
    {
      "role": "user",
      "content": "Explain the trade-offs between consistency and availability in distributed systems."
    }
  ]
}' | jq

Structured outputs

Structured outputs constrain the model to respond with a specific JSON schema. You must provide the schema definition in your request.

Provide the JSON schema definition in the text.format field.

curl "localhost:3000/model/us.anthropic.claude-opus-4-20250514-v1:0/converse" \
  -H content-type:application/json -d '{
  "model": "",
  "max_output_tokens": 256,
  "input": "Is the sky blue? Respond with your answer and a confidence score between 0 and 1.",
  "text": {
    "format": {
      "type": "json_schema",
      "json_schema": {
        "name": "answer_schema",
        "schema": {
          "type": "object",
          "properties": {
            "answer": { "type": "string" },
            "confidence": { "type": "number" }
          },
          "required": ["answer", "confidence"],
          "additionalProperties": false
        }
      }
    }
  }
}' | jq

Provide the schema definition in the response_format field. Agentgateway translates this to Bedrock’s native format automatically.

curl "localhost:3000/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 256,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "answer_schema",
      "schema": {
        "type": "object",
        "properties": {
          "answer": { "type": "string" },
          "confidence": { "type": "number" }
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "Is the sky blue? Respond with your answer and a confidence score between 0 and 1."
    }
  ]
}' | jq
Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.