Multiple endpoints

Docs

Documentation

Agentgateway

Providers

Multiple endpoints

Configure access to multiple OpenAI API endpoints such as for chat completions, embeddings, and models through the same Backend.

About

To set up multiple LLM endpoints, use the ai.llm.routes field. This field maps the API paths to supported route types. The keys are URL suffix matches, like /v1/models. The values are the route types, like completions or passthrough.

completions: Transforms to the LLM provider format and processes the request with the LLM provider. This route type supports full LLM features such as tokenization, rate limiting, transformations, and other policies like prompt guards.
passthrough: Forwards the request to the LLM provider as-is. This route type does not support LLM features like route processing and policies. You might use this route type for non-chat endpoints such as health checks, GET requests like listing models, or custom endpoints that you want to pass traffic through to.

Paths are matched in order, and the first match determines how the request is handled. The wildcard character * can be used to match anything. If no route is set, the route defaults to the completions endpoint.

Before you begin

Set up an agentgateway proxy.
Set up API access to each LLM provider that you want to use. The example in this guide uses OpenAI.

Configure multiple endpoints

Configure access to multiple endpoints in your LLM provider, such as for chat completions, embeddings, and models through the same Backend. The following steps use OpenAI as an example.

Update your Backend resource to include a routes field that maps API paths to route types.

kubectl apply -f- <<EOF
apiVersion: gateway.kgateway.dev/v1alpha1
kind: Backend
metadata:
  name: openai
  namespace: kgateway-system
spec:
  type: AI
  ai:
    llm:
      openai:
        authToken:
          kind: SecretRef
          secretRef:
            name: openai-secret
        model: "gpt-3.5-turbo"
      routes:
        "/v1/chat/completions": "completions"
        "/v1/embeddings": "passthrough"
        "/v1/models": "passthrough"
        "*": "passthrough"
EOF

Review the following table to understand this configuration.

Setting	Description
`v1/chat/completions`	Routes to the chat completions endpoint with LLM-specific processing. This endpoint is used for chat-based interactions. For more information, see the OpenAI API docs for the endpoint.
`v1/embeddings`	Routes to the embeddings endpoint with passthrough processing. This endpoint is used to to get vector embeddings that machine learning models can use more easily than chat-based interactions. For more information, see the OpenAI API docs for the endpoint.
`v1/models`	Routes to the models endpoint with passthrough processing. This endpoint is used to get basic information about the models that are available. For more information, see the OpenAI API docs for the endpoint.
`*`	Matches any path that doesn’t match the specific endpoints otherwise set. Typically, you set this value to `passthrough` to pass through to the provider API without LLM-specific processing.

Create an HTTPRoute resource that routes traffic to the OpenAI Backend along the /openai path matcher. Note that because you set up the routes map on the Backend, you do not need to create any URLRewrite filters to point your route matcher to the correct LLM provider endpoint.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: openai
  namespace: kgateway-system
spec:
  parentRefs:
    - name: agentgateway
      namespace: kgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /openai
    backendRefs:
    - name: openai
      namespace: kgateway-system
      group: gateway.kgateway.dev
      kind: Backend
EOF

Send requests to different OpenAI endpoints. With the routes configured, you can access different OpenAI endpoints by including the full path in your requests:

Chat completions:

curl "$INGRESS_GW_ADDRESS:8080/openai/v1/chat/completions" \
  -H content-type:application/json \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }' | jq

Embeddings:

curl "$INGRESS_GW_ADDRESS:8080/openai/v1/embeddings" \
  -H content-type:application/json \
  -d '{
    "model": "text-embedding-ada-002",
    "input": "The food was delicious"
  }' | jq

Models list:

curl "$INGRESS_GW_ADDRESS:8080/openai/v1/models" | jq

Chat completions:

curl "localhost:8080/openai/v1/chat/completions" \
  -H content-type:application/json \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }' | jq

Embeddings:

curl "localhost:8080/openai/v1/embeddings" \
  -H content-type:application/json \
  -d '{
    "model": "text-embedding-ada-002",
    "input": "The food was delicious"
  }' | jq

Models list:

curl "localhost:8080/openai/v1/models" | jq

Vertex AI