Prompt templates

Verified

Use model-level transformations to dynamically customize LLM request parameters based on request context such as headers, user identity, or other runtime information. Agentgateway uses CEL (Common Expression Language) expressions to evaluate and set LLM request fields at runtime.

About LLM transformations

Model-level transformations allow you to dynamically compute LLM request fields using CEL expressions that can reference incoming request headers, existing request fields, and other context. This is useful for enforcing per-user policies, customizing model behavior based on caller identity, and applying conditional request modifications without changing client code.

To learn more about CEL, see the following resources:

ℹ️

Try out CEL expressions in the built-in CEL playground in the agentgateway admin UI before using them in your configuration.

Before you begin

Install the agentgateway binary.

Conditionally set max tokens based on user identity

Use a CEL expression in the model-level transformation field to dynamically set max_tokens based on the caller’s identity from a request header. This example gives admin users a higher token limit than regular users.

cat <<'EOF' > config.yaml
# yaml-language-server: $schema=https://agentgateway.dev/schema/config

llm:
  models:
  - name: "*"
    provider: openAI
    params:
      apiKey: "$OPENAI_API_KEY"
    transformation:
      max_tokens: "request.headers['x-user-id'] == 'admin' ? 100 : 10"
EOF

The response follows the prepended and appended guidelines even though they were not in the original request.

Dynamic prompt templates

Dynamic templates use CEL transformations to inject variables from the request context into prompts. This is ideal for personalizing prompts with user identity, adding request metadata, or applying conditional prompt modification based on headers or claims.

ℹ️

JWT claims in transformations require JWT authentication to be configured. See the authentication documentation for setup instructions.

Inject user identity from headers

Configure transformations to inject user identity from request headers into the prompt.

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
binds:
- port: 3000
  listeners:
  - routes:
    - backends:
      - ai:
          name: openai
          provider:
            openAI:
              model: gpt-3.5-turbo
      policies:
        backendAuth:
          key: "$OPENAI_API_KEY"
        transformations:
          request:
            body: |
              json(request.body).with(body,
                {
                  "model": body.model,
                  "messages": [{"role": "system", "content": "You are assisting user: " + default(request.headers["x-user-id"], "anonymous")}]
                    + body.messages
                }
              ).toJson()

Send a request as a regular user and verify the response is capped at the lower token limit.

curl -s http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-user-id: alice" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Tell me a story"}]
  }' | jq .

In the responses, the admin user receives up to 100 completion tokens while the regular user is capped at 10.

Available CEL variables

You can use these variables in your CEL transformation expressions.

Variable	Description	Example
`request.headers["name"]`	Request header values	`request.headers["x-user-id"]`
`request.path`	Request path	`request.path` returns `/`
`request.method`	HTTP method	`request.method` returns `POST`
`llmRequest.max_tokens`	Original max_tokens from the request	`min(llmRequest.max_tokens, 100)`
`llmRequest.model`	Requested model name	`llmRequest.model`

For a complete list of available variables and functions, see the CEL reference documentation.

Common transformation patterns

Cap token usage

Enforce a maximum token limit regardless of what the client requests.

llm:
  models:
  - name: "*"
    provider: openAI
    params:
      apiKey: "$OPENAI_API_KEY"
    transformation:
      max_tokens: "min(llmRequest.max_tokens, 1024)"

Set temperature based on headers

Allow callers to control creativity through a header while enforcing bounds.

llm:
  models:
  - name: "*"
    provider: openAI
    params:
      apiKey: "$OPENAI_API_KEY"
    transformation:
      temperature: "request.headers['x-creativity'] == 'high' ? 0.9 : 0.1"

Combine multiple transformations

Apply several field-level transformations in a single configuration.

llm:
  models:
  - name: "*"
    provider: openAI
    params:
      apiKey: "$OPENAI_API_KEY"
    transformation:
      max_tokens: "request.headers['x-user-tier'] == 'premium' ? 4096 : 256"
      temperature: "request.headers['x-user-tier'] == 'premium' ? 0.8 : 0.3"

Next steps

Learn about CEL expressions for advanced expression logic.
Explore transformations for more LLM request transformation examples.
Set up authentication to use JWT claims in transformations.

Control spend Transform requests