Ollama

Verified Code examples on this page have been automatically tested and verified.

Configure Ollama to serve local models through agentgateway. Ollama runs on a machine outside your cluster, and agentgateway routes requests to it over the network.

Before you begin

  1. Install and set up an agentgateway proxy.
  2. Install and run Ollama on a machine accessible from your Kubernetes cluster.

  3. Get the IP address of the machine running Ollama.

Set up Ollama

  1. From the cluster where you installed Ollama, make sure that you have at least one model pulled.

    ollama list

    If not, pull a model.

    ollama pull llama3.2
  2. Configure Ollama to accept external connections. By default, Ollama only listens on localhost. You can change this setting with the OLLAMA_HOST environment variable.

    export OLLAMA_HOST=0.0.0.0:11434
    ⚠️
    Binding Ollama to 0.0.0.0 exposes it on all network interfaces. Use firewall rules to restrict access to your Kubernetes cluster nodes only.
  3. Restart Ollama to apply the new setting.

  4. Verify Ollama is accessible from the machine’s network address.

    curl http://<OLLAMA_IP>:11434/v1/models

Configure agentgateway to reach Ollama

Because Ollama runs outside your Kubernetes cluster, you need a headless Service and EndpointSlice to give it a stable in-cluster DNS name.

  1. Get the IP address of the machine running Ollama.

    # macOS
    ipconfig getifaddr en0
    
    # Linux
    hostname -I | awk '{print $1}'
  2. Create a headless Service and EndpointSlice that point to the external Ollama instance. Replace <OLLAMA_IP> with the actual IP address.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Service
    metadata:
      name: ollama
      namespace: agentgateway-system
    spec:
      type: ClusterIP
      clusterIP: None
      ports:
      - port: 11434
        targetPort: 11434
        protocol: TCP
    ---
    apiVersion: discovery.k8s.io/v1
    kind: EndpointSlice
    metadata:
      name: ollama
      namespace: agentgateway-system
      labels:
        kubernetes.io/service-name: ollama
    addressType: IPv4
    endpoints:
    - addresses:
      - <OLLAMA_IP>
    ports:
    - port: 11434
      protocol: TCP
    EOF
  3. Create an AgentgatewayBackend resource. The openai provider type is used because Ollama exposes an OpenAI-compatible API. The host and port fields point to the headless Service DNS name.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: ollama
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          openai:
            model: llama3.2
          host: ollama.agentgateway-system.svc.cluster.local
          port: 11434
    EOF

    Review the following table to understand this configuration. For more information, see the API reference.

    SettingDescription
    ai.provider.openaiThe OpenAI-compatible provider type. Ollama exposes an OpenAI-compatible API, so the openai type is used here.
    openai.modelThe Ollama model to use. This must match a model you pulled with ollama pull.
    hostThe in-cluster DNS name of the headless Service pointing to the external Ollama instance.
    portThe port Ollama listens on. The default is 11434.
  4. Create an HTTPRoute to expose the Ollama backend through the gateway.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: ollama
      namespace: agentgateway-system
    spec:
      parentRefs:
      - name: agentgateway-proxy
        namespace: agentgateway-system
      rules:
      - backendRefs:
        - name: ollama
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
  1. Send a request to verify the setup.

    curl "$INGRESS_GW_ADDRESS" \
      -H "content-type: application/json" \
      -d '{
        "model": "llama3.2",
        "messages": [
          {
            "role": "user",
            "content": "Explain the benefits of running models locally."
          }
        ]
      }' | jq

    In one terminal, start a port-forward to the gateway:

    kubectl port-forward -n agentgateway-system svc/agentgateway-proxy 8080:80

    In a second terminal, send a request:

    curl "localhost:8080" \
      -H "content-type: application/json" \
      -d '{
        "model": "llama3.2",
        "messages": [
          {
            "role": "user",
            "content": "Explain the benefits of running models locally."
          }
        ]
      }' | jq

    Example output:

    {
      "id": "chatcmpl-123",
      "object": "chat.completion",
      "created": 1727967462,
      "model": "llama3.2",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Running models locally provides complete data privacy, no API costs or rate limits, and consistent low latency without network dependencies."
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 15,
        "completion_tokens": 32,
        "total_tokens": 47
      }
    }

Troubleshooting

Connection refused or 503 response

What’s happening:

Requests fail with a connection error or the gateway returns a 503 response.

Why it’s happening:

The Kubernetes cluster cannot reach the Ollama instance. This is usually caused by an incorrect IP in the EndpointSlice, a firewall blocking port 11434, or Ollama not configured to accept external connections.

How to fix it:

  1. Verify Ollama is reachable from the machine’s network address:

    curl http://<OLLAMA_IP>:11434/v1/models
  2. Check that the EndpointSlice contains the correct IP:

    kubectl get endpointslice ollama -n agentgateway-system -o yaml
  3. Test connectivity from inside the cluster:

    kubectl run -it --rm debug --image=curlimages/curl --restart=Never \
      -- curl http://ollama.agentgateway-system.svc.cluster.local:11434/v1/models

Model not found

What’s happening:

The request returns an error indicating the model is not available.

Why it’s happening:

The model specified in the request or the AgentgatewayBackend resource has not been pulled in Ollama.

How to fix it:

  1. List models available in Ollama:

    ollama list
  2. Pull the model if it is missing:

    ollama pull llama3.2

Next steps

Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.