Traffic splitting
Verified Code examples on this page have been automatically tested and verified.Set up weight-based routing between multiple apps for A/B testing, traffic splitting, and canary deployments.
About A/B testing and traffic splitting
A/B testing, traffic splitting, and canary deployments are techniques for gradually introducing changes by distributing traffic across multiple versions of an app or service based on weight percentages.
Common use cases:
- A/B testing: Compare two versions of an app by routing a percentage of traffic to each version to measure performance, user engagement, or business metrics.
- Traffic splitting: Distribute load across multiple backends, such as different LLM models or providers, to balance cost, performance, or capacity.
- Canary deployments: Gradually roll out a new version of your app by routing a small percentage of traffic to the new version, then increasing the percentage as confidence grows.
These patterns use weighted backendRefs in HTTPRoute (a standard Gateway API feature) to control the percentage of requests sent to each backend. Unlike failover, which uses priority groups to switch between backends when one fails, traffic splitting distributes traffic based on static weight ratios.
Before you begin
Follow the Get started guide to install agentgateway.
Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.
Get the external address of the gateway and save it in an environment variable.
export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system http -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}") echo $INGRESS_GW_ADDRESSkubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 8080:80
Example 1: A/B testing with multiple app versions
This example demonstrates A/B testing and canary deployments by distributing traffic across 3 versions of the Helloworld sample app.
Deploy the Helloworld sample app
Create the helloworld namespace.
kubectl create namespace helloworldDeploy the Hellworld sample apps.
kubectl -n helloworld apply -f https://raw.githubusercontent.com/solo-io/gloo-edge-use-cases/main/docs/sample-apps/helloworld.yamlExample output:
service/helloworld-v1 created service/helloworld-v2 created service/helloworld-v3 created deployment.apps/helloworld-v1 created deployment.apps/helloworld-v2 created deployment.apps/helloworld-v3 createdVerify that the Helloworld pods are up and running.
kubectl -n default get pods -n helloworldExample output:
NAME READY STATUS RESTARTS AGE helloworld-v1-5c457458f-rfkc7 3/3 Running 0 30s helloworld-v2-6594c54f6b-8dvjp 3/3 Running 0 29s helloworld-v3-8576f76d87-czdll 3/3 Running 0 29s
Set up weighted routing
Create an HTTPRoute resource for the
traffic.split.exampledomain that routes 10% of the traffic tohelloworld-v1, 10% tohelloworld-v2, and 80% tohelloworld-v3.This configuration demonstrates a canary deployment pattern where version 3 (the stable version) receives most traffic while versions 1 and 2 (canary versions) receive smaller amounts for testing.
kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: traffic-split namespace: helloworld spec: parentRefs: - name: http namespace: agentgateway-system hostnames: - traffic.split.example rules: - matches: - path: type: PathPrefix value: / backendRefs: - name: helloworld-v1 port: 5000 weight: 10 - name: helloworld-v2 port: 5000 weight: 10 - name: helloworld-v3 port: 5000 weight: 80 EOFSetting Description spec.parentRefs.nameThe name and namespace of the gateway resource that serves the route. In this example, you use the gateway that you created when you set up the Sample app. spec.hostnamesThe hostname for which you want to apply traffic splitting. spec.rules.matches.pathThe path prefix to match on. In this example, /is used.spec.rules.backendRefsA list of services you want to forward traffic to. Use the weightoption to define the amount of traffic that you want to forward to each service.Verify that the HTTPRoute is applied successfully.
kubectl get httproute/traffic-split -n helloworld -o yamlSend a few requests to the
/hellopath. Verify that you see responses from all 3 Helloworld apps, and that most responses are returned fromhelloworld-v3.for i in {1..20}; do curl -i http://$INGRESS_GW_ADDRESS:80/hello \ -H "host: traffic.split.example:8080"; donefor i in {1..20}; do curl -i localhost:8080/hello \ -H "host: traffic.split.example"; doneExample output:
HTTP/1.1 200 OK server: envoy date: Wed, 12 Mar 2025 20:59:35 GMT content-type: text/html; charset=utf-8 content-length: 60 x-envoy-upstream-service-time: 110 Hello version: v3, instance: helloworld-v3-55bfdf76cf-nv545
Example 2: A/B testing with LLM models
This example demonstrates traffic splitting for LLM workloads, distributing requests across multiple models or providers for cost optimization or A/B testing.
Set up weighted routing for LLM models
Create separate AgentgatewayBackend resources for each model you want to include in the traffic split.
This example creates two backends: one for the cheaper
gpt-4o-minimodel and one for the more capablegpt-4omodel.kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: openai-mini-backend namespace: agentgateway-system spec: ai: provider: openai: model: gpt-4o-mini policies: auth: secretRef: name: openai-secret --- apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: openai-premium-backend namespace: agentgateway-system spec: ai: provider: openai: model: gpt-4o policies: auth: secretRef: name: openai-secret EOFCreate an HTTPRoute resource with weighted
backendRefsto distribute traffic between the two backends.This example routes 80% of traffic to the cheaper
gpt-4o-minimodel and 20% to the more capablegpt-4omodel, allowing you to optimize costs while testing the premium model’s performance.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: test namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - matches: - path: type: PathPrefix value: /test backendRefs: - name: openai-mini-backend namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend weight: 80 - name: openai-premium-backend namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend weight: 20 EOFSetting Description spec.rules[].backendRefs[].weightThe relative weight for traffic distribution. In this example, weights of 80 and 20 result in an 80/20 traffic split. The default weight is 1 if not specified. Send multiple requests to observe the traffic distribution. In your request, do not specify a model. Instead, the HTTPRoute distributes traffic according to the backend weights (80% to gpt-4o-mini, 20% to gpt-4o).
for i in {1..10}; do curl -s "$INGRESS_GW_ADDRESS/test" \ -H "Content-Type: application/json" \ -d '{"messages": [{"role": "user", "content": "What is 2+2?"}]}' | \ jq -r '.model' donefor i in {1..10}; do curl -s "localhost:8080/test" \ -H "Content-Type: application/json" \ -d '{"messages": [{"role": "user", "content": "What is 2+2?"}]}' | \ jq -r '.model' doneExample output showing ~80% gpt-4o-mini and ~20% gpt-4o responses:
gpt-4o-mini-2024-07-18 gpt-4o-mini-2024-07-18 gpt-4o-2024-08-06 gpt-4o-mini-2024-07-18 gpt-4o-mini-2024-07-18 gpt-4o-mini-2024-07-18 gpt-4o-mini-2024-07-18 gpt-4o-2024-08-06 gpt-4o-mini-2024-07-18 gpt-4o-mini-2024-07-18
Cleanup
You can remove the resources that you created in this guide.Remove the backends and routes.
kubectl delete httproute traffic-split -n helloworld kubectl delete httproute test -n agentgateway-system kubectl delete AgentgatewayBackend openai-mini-backend openai-premium-backend -n agentgateway-systemRemove the Helloworld apps.
kubectl delete -n helloworld -f https://raw.githubusercontent.com/solo-io/gloo-edge-use-cases/main/docs/sample-apps/helloworld.yaml