Ollama for local LLMs

⚠️

AI Gateway for Envoy-based gateway proxies is deprecated and is planned to be removed in version 2.2. If you want to use AI capabilities, use an agentgateway proxy instead.

Instead of a cloud LLM provider, you might want to use a local LLM provider such as Ollama for local development.

Before you begin

Set up AI Gateway.
As part of the AI Gateway setup, make sure that you set up the GatewayParameters resource to use a NodePort service.
```
kubectl get GatewayParameters ai-gateway -n kgateway-system -o jsonpath='{.items[*].spec.kube.service.type}'
```
Example output:
```
NodePort
```

Start Ollama locally

Start running an Ollama server as a local LLM provider.

Find your local IP address.
ipconfig getifaddr en0
ifconfig
ipconfig
Example output: Note the inet 192.168.1.100 address.
```
inet 192.168.1.100 netmask 255.255.255.0 broadcast 192.168.1.255
```
Set the IP address as an environment variable.
```
export OLLAMA_HOST=192.168.1.100
```

Start your local LLM provider.

ollama serve

Example output:

time=2025-05-21T12:33:42.433+08:00 level=INFO source=routes.go:1205 msg="server config"
env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096
OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0
OLLAMA_HOST:http://192.168.181.210:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:
OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0
OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/zhengkezhou/.ollama/models
OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false
OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost
http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:*
http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*
vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-05-21T12:33:42.436+08:00 level=INFO source=images.go:463 msg="total blobs: 12"
time=2025-05-21T12:33:42.437+08:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0"
time=2025-05-21T12:33:42.437+08:00 level=INFO source=routes.go:1258 msg="Listening on 192.168.181.210:11434 (version 0.7.0)"
time=2025-05-21T12:33:42.478+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="16.0 GiB" available="16.0 GiB"

Set up Ollama with AI Gateway

To use Ollama with AI Gateway, create Backend and HTTPRoute resources.

Create the Backend resource so that you can route requests to the Ollama server.

kubectl apply -f- <<EOF
apiVersion: gateway.kgateway.dev/v1alpha1
kind: Backend
metadata:
  labels:
    app: ai-gateway
  name: ollama
  namespace: kgateway-system
spec:
  type: AI
  ai:
    llm:
      host: $OLLAMA_HOST # replace with your IP address
      port: 11434
      openai:
        model: "llama3.2" # replace with your model
        authToken:
          kind: Inline
          inline: "$TOKEN"
EOF

Review the following table to understand this configuration.

Setting	Description
`host`	Your local IP address from the previous step (`$OLLAMA_HOST`).
`port`	The port of your local LLM provider (default port 11434 for Ollama).
`authToken`	Although authentication is not required for your local Ollama server, the `authToken` field is required to create an AI Backend in kgateway. You can provide any placeholder value for the inline token.
`model`	The Ollama model you want to use, such as `llama3.2`.

Create an HTTPRoute resource that routes incoming traffic to the Backend. The following example sets up a route on the ollama path to the Backend you previously created. The URLRewrite filter rewrites the path from ollama to the API path you want to use in the LLM provider, /v1/models.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: ollama
  namespace: kgateway-system
  labels:
    app: ai-gateway
spec:
  parentRefs:
    - name: ai-gateway
      namespace: kgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /ollama
    filters:
    - type: URLRewrite
      urlRewrite:
        path:
          type: ReplaceFullPath
          replaceFullPath: /v1/models
    backendRefs:
    - name: ollama
      namespace: kgateway-system
      group: gateway.kgateway.dev
      kind: Backend
EOF

For local testing: Port forward the AI Gateway service.

kubectl port-forward svc/ai-gateway 8080:8080 -n kgateway-system

Send a request to the Ollama server that you started in the previous section. Verify that the request succeeds and that you get back a response from the chat completion API.

curl -v "localhost:8080/ollama" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }' | jq

Example output:

{
  "id": "chatcmpl-534",
  "object": "chat.completion",
  "created": 1747805667,
  "model": "llama3.2",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "It's nice to meet you. Is there something I can help you with, or would you like to chat?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 33,
    "completion_tokens": 24,
    "total_tokens": 57
  }
}

Now that you can send requests to an LLM provider, explore the other AI Gateway features.

Model failover Function calling Prompt enrichment Prompt guards AI Gateway metrics

Cloud LLM providers Authenticate to the LLM

Ollama for local LLMs

Before you begin

Start Ollama locally

Set up Ollama with AI Gateway

Next