Prompt templates

Verified Code examples on this page have been automatically tested and verified.

Use model-level transformations to dynamically customize LLM request parameters based on request context such as headers, user identity, or other runtime information. Agentgateway uses CEL (Common Expression Language) expressions to evaluate and set LLM request fields at runtime.

About LLM transformations

Model-level transformations allow you to dynamically compute LLM request fields using CEL expressions that can reference incoming request headers, existing request fields, and other context. This is useful for enforcing per-user policies, customizing model behavior based on caller identity, and applying conditional request modifications without changing client code.

To learn more about CEL, see the following resources:

ℹ️
Try out CEL expressions in the built-in CEL playground in the agentgateway admin UI before using them in your configuration.

Before you begin

Install the agentgateway binary.

Conditionally set max tokens based on user identity

Use a CEL expression in the model-level transformation field to dynamically set max_tokens based on the caller’s identity from a request header. This example gives admin users a higher token limit than regular users.

cat <<'EOF' > config.yaml
# yaml-language-server: $schema=https://agentgateway.dev/schema/config

llm:
  models:
  - name: "*"
    provider: openAI
    params:
      apiKey: "$OPENAI_API_KEY"
    transformation:
      max_tokens: "request.headers['x-user-id'] == 'admin' ? 100 : 10"
EOF

The response follows the prepended and appended guidelines even though they were not in the original request.

Dynamic prompt templates

Dynamic templates use CEL transformations to inject variables from the request context into prompts. This is ideal for personalizing prompts with user identity, adding request metadata, or applying conditional prompt modification based on headers or claims.

ℹ️
JWT claims in transformations require JWT authentication to be configured. See the authentication documentation for setup instructions.

Inject user identity from headers

Configure transformations to inject user identity from request headers into the prompt.

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
binds:
- port: 3000
  listeners:
  - routes:
    - backends:
      - ai:
          name: openai
          provider:
            openAI:
              model: gpt-3.5-turbo
      policies:
        backendAuth:
          key: "$OPENAI_API_KEY"
        transformations:
          request:
            body: |
              json(request.body).with(body,
                {
                  "model": body.model,
                  "messages": [{"role": "system", "content": "You are assisting user: " + default(request.headers["x-user-id"], "anonymous")}]
                    + body.messages
                }
              ).toJson()

Send a request as a regular user and verify the response is capped at the lower token limit.

curl -s http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-user-id: alice" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Tell me a story"}]
  }' | jq .

In the responses, the admin user receives up to 100 completion tokens while the regular user is capped at 10.

Available CEL variables

You can use these variables in your CEL transformation expressions.

VariableDescriptionExample
request.headers["name"]Request header valuesrequest.headers["x-user-id"]
request.pathRequest pathrequest.path returns /
request.methodHTTP methodrequest.method returns POST
llmRequest.max_tokensOriginal max_tokens from the requestmin(llmRequest.max_tokens, 100)
llmRequest.modelRequested model namellmRequest.model

For a complete list of available variables and functions, see the CEL reference documentation.

Common transformation patterns

Cap token usage

Enforce a maximum token limit regardless of what the client requests.

llm:
  models:
  - name: "*"
    provider: openAI
    params:
      apiKey: "$OPENAI_API_KEY"
    transformation:
      max_tokens: "min(llmRequest.max_tokens, 1024)"

Set temperature based on headers

Allow callers to control creativity through a header while enforcing bounds.

llm:
  models:
  - name: "*"
    provider: openAI
    params:
      apiKey: "$OPENAI_API_KEY"
    transformation:
      temperature: "request.headers['x-creativity'] == 'high' ? 0.9 : 0.1"

Combine multiple transformations

Apply several field-level transformations in a single configuration.

llm:
  models:
  - name: "*"
    provider: openAI
    params:
      apiKey: "$OPENAI_API_KEY"
    transformation:
      max_tokens: "request.headers['x-user-tier'] == 'premium' ? 4096 : 256"
      temperature: "request.headers['x-user-tier'] == 'premium' ? 0.8 : 0.3"

Next steps

Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.