Ollama for local LLMs
Instead of a cloud LLM provider, you might want to use a local LLM provider such as Ollama for local development.
Before you begin
-
As part of the AI Gateway setup, make sure that you set up the GatewayParameters resource to use a NodePort service.
kubectl get GatewayParameters ai-gateway -n kgateway-system -o jsonpath='{.items[*].spec.kube.service.type}'Example output:
NodePort
Start Ollama locally
Start running an Ollama server as a local LLM provider.
-
Find your local IP address.
ipconfig getifaddr en0ifconfigipconfigExample output: Note the
inet 192.168.1.100address.inet 192.168.1.100 netmask 255.255.255.0 broadcast 192.168.1.255 -
Set the IP address as an environment variable.
export OLLAMA_HOST=192.168.1.100 -
Start your local LLM provider.
ollama serveExample output:
time=2025-05-21T12:33:42.433+08:00 level=INFO source=routes.go:1205 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://192.168.181.210:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/zhengkezhou/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2025-05-21T12:33:42.436+08:00 level=INFO source=images.go:463 msg="total blobs: 12" time=2025-05-21T12:33:42.437+08:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0" time=2025-05-21T12:33:42.437+08:00 level=INFO source=routes.go:1258 msg="Listening on 192.168.181.210:11434 (version 0.7.0)" time=2025-05-21T12:33:42.478+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="16.0 GiB" available="16.0 GiB"
Set up Ollama with AI Gateway
To use Ollama with AI Gateway, create Backend and HTTPRoute resources.
-
Create the Backend resource so that you can route requests to the Ollama server.
Review the following table to understand this configuration.kubectl apply -f- <<EOF apiVersion: gateway.kgateway.dev/v1alpha1 kind: Backend metadata: labels: app: ai-gateway name: ollama namespace: kgateway-system spec: type: AI ai: llm: host: $OLLAMA_HOST # replace with your IP address port: 11434 openai: model: "llama3.2" # replace with your model authToken: kind: Inline inline: "$TOKEN" EOFSetting Description hostYour local IP address from the previous step ( $OLLAMA_HOST).portThe port of your local LLM provider (default port 11434 for Ollama). authTokenAlthough authentication is not required for your local Ollama server, the authTokenfield is required to create an AI Backend in kgateway. You can provide any placeholder value for the inline token.modelThe Ollama model you want to use, such as llama3.2. -
Create an HTTPRoute resource that routes incoming traffic to the Backend. The following example sets up a route on the
ollamapath to the Backend you previously created. TheURLRewritefilter rewrites the path fromollamato the API path you want to use in the LLM provider,/v1/models.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: ollama namespace: kgateway-system labels: app: ai-gateway spec: parentRefs: - name: ai-gateway namespace: kgateway-system rules: - matches: - path: type: PathPrefix value: /ollama filters: - type: URLRewrite urlRewrite: path: type: ReplaceFullPath replaceFullPath: /v1/models backendRefs: - name: ollama namespace: kgateway-system group: gateway.kgateway.dev kind: Backend EOF -
For local testing: Port forward the AI Gateway service.
kubectl port-forward svc/ai-gateway 8080:8080 -n kgateway-system -
Send a request to the Ollama server that you started in the previous section. Verify that the request succeeds and that you get back a response from the chat completion API.
curl -v "localhost:8080/ollama" \ -H "Content-Type: application/json" \ -d '{ "model": "llama3.2", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ] }' | jqExample output:
{ "id": "chatcmpl-534", "object": "chat.completion", "created": 1747805667, "model": "llama3.2", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "It's nice to meet you. Is there something I can help you with, or would you like to chat?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 33, "completion_tokens": 24, "total_tokens": 57 } }
Next
Now that you can send requests to an LLM provider, explore the other AI Gateway features.