Multiple endpoints

Configure access to multiple OpenAI API endpoints such as for chat completions, embeddings, and models through the same Backend.

About

To set up multiple LLM endpoints, use the ai.llm.routes field. This field maps the API paths to supported route types. The keys are URL suffix matches, like /v1/models. The values are the route types, like completions or passthrough.

  • completions: Transforms to the LLM provider format and processes the request with the LLM provider. This route type supports full LLM features such as tokenization, rate limiting, transformations, and other policies like prompt guards.
  • passthrough: Forwards the request to the LLM provider as-is. This route type does not support LLM features like route processing and policies. You might use this route type for non-chat endpoints such as health checks, GET requests like listing models, or custom endpoints that you want to pass traffic through to.

Paths are matched in order, and the first match determines how the request is handled. The wildcard character * can be used to match anything. If no route is set, the route defaults to the completions endpoint.

Before you begin

  1. Set up an agentgateway proxy.
  2. Set up API access to each LLM provider that you want to use. The example in this guide uses OpenAI.

Configure multiple endpoints

Configure access to multiple endpoints in your LLM provider, such as for chat completions, embeddings, and models through the same Backend. The following steps use OpenAI as an example.

  1. Update your Backend resource to include a routes field that maps API paths to route types.

    kubectl apply -f- <<EOF
    apiVersion: gateway.kgateway.dev/v1alpha1
    kind: Backend
    metadata:
      name: openai
      namespace: kgateway-system
    spec:
      type: AI
      ai:
        llm:
          openai:
            authToken:
              kind: SecretRef
              secretRef:
                name: openai-secret
            model: "gpt-3.5-turbo"
          routes:
            "/v1/chat/completions": "completions"
            "/v1/embeddings": "passthrough"
            "/v1/models": "passthrough"
            "*": "passthrough"
    EOF

    Review the following table to understand this configuration.

    Setting Description
    v1/chat/completions Routes to the chat completions endpoint with LLM-specific processing. This endpoint is used for chat-based interactions. For more information, see the OpenAI API docs for the endpoint.
    v1/embeddings Routes to the embeddings endpoint with passthrough processing. This endpoint is used to to get vector embeddings that machine learning models can use more easily than chat-based interactions. For more information, see the OpenAI API docs for the endpoint.
    v1/models Routes to the models endpoint with passthrough processing. This endpoint is used to get basic information about the models that are available. For more information, see the OpenAI API docs for the endpoint.
    * Matches any path that doesn’t match the specific endpoints otherwise set. Typically, you set this value to passthrough to pass through to the provider API without LLM-specific processing.
  2. Create an HTTPRoute resource that routes traffic to the OpenAI Backend along the /openai path matcher. Note that because you set up the routes map on the Backend, you do not need to create any URLRewrite filters to point your route matcher to the correct LLM provider endpoint.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: openai
      namespace: kgateway-system
    spec:
      parentRefs:
        - name: agentgateway
          namespace: kgateway-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /openai
        backendRefs:
        - name: openai
          namespace: kgateway-system
          group: gateway.kgateway.dev
          kind: Backend
    EOF
  3. Send requests to different OpenAI endpoints. With the routes configured, you can access different OpenAI endpoints by including the full path in your requests:

    Chat completions:

    curl "$INGRESS_GW_ADDRESS:8080/openai/v1/chat/completions" \
      -H content-type:application/json \
      -d '{
        "model": "gpt-3.5-turbo",
        "messages": [{"role": "user", "content": "Hello!"}]
      }' | jq

    Embeddings:

    curl "$INGRESS_GW_ADDRESS:8080/openai/v1/embeddings" \
      -H content-type:application/json \
      -d '{
        "model": "text-embedding-ada-002",
        "input": "The food was delicious"
      }' | jq

    Models list:

    curl "$INGRESS_GW_ADDRESS:8080/openai/v1/models" | jq

    Chat completions:

    curl "localhost:8080/openai/v1/chat/completions" \
      -H content-type:application/json \
      -d '{
        "model": "gpt-3.5-turbo",
        "messages": [{"role": "user", "content": "Hello!"}]
      }' | jq

    Embeddings:

    curl "localhost:8080/openai/v1/embeddings" \
      -H content-type:application/json \
      -d '{
        "model": "text-embedding-ada-002",
        "input": "The food was delicious"
      }' | jq

    Models list:

    curl "localhost:8080/openai/v1/models" | jq