Multiple endpoints
Configure access to multiple OpenAI API endpoints such as for chat completions, embeddings, and models through the same Backend.
About
To set up multiple LLM endpoints, use the ai.llm.routes field. This field maps the API paths to supported route types. The keys are URL suffix matches, like /v1/models. The values are the route types, like completions or passthrough.
completions: Transforms to the LLM provider format and processes the request with the LLM provider. This route type supports full LLM features such as tokenization, rate limiting, transformations, and other policies like prompt guards.passthrough: Forwards the request to the LLM provider as-is. This route type does not support LLM features like route processing and policies. You might use this route type for non-chat endpoints such as health checks,GETrequests like listing models, or custom endpoints that you want to pass traffic through to.
Paths are matched in order, and the first match determines how the request is handled. The wildcard character * can be used to match anything. If no route is set, the route defaults to the completions endpoint.
Before you begin
- Set up an agentgateway proxy.
- Set up API access to each LLM provider that you want to use. The example in this guide uses OpenAI.
Configure multiple endpoints
Configure access to multiple endpoints in your LLM provider, such as for chat completions, embeddings, and models through the same Backend. The following steps use OpenAI as an example.
-
Update your Backend resource to include a
routesfield that maps API paths to route types.kubectl apply -f- <<EOF apiVersion: gateway.kgateway.dev/v1alpha1 kind: Backend metadata: name: openai namespace: kgateway-system spec: type: AI ai: llm: openai: authToken: kind: SecretRef secretRef: name: openai-secret model: "gpt-3.5-turbo" routes: "/v1/chat/completions": "completions" "/v1/embeddings": "passthrough" "/v1/models": "passthrough" "*": "passthrough" EOFReview the following table to understand this configuration.
Setting Description v1/chat/completionsRoutes to the chat completions endpoint with LLM-specific processing. This endpoint is used for chat-based interactions. For more information, see the OpenAI API docs for the endpoint. v1/embeddingsRoutes to the embeddings endpoint with passthrough processing. This endpoint is used to to get vector embeddings that machine learning models can use more easily than chat-based interactions. For more information, see the OpenAI API docs for the endpoint. v1/modelsRoutes to the models endpoint with passthrough processing. This endpoint is used to get basic information about the models that are available. For more information, see the OpenAI API docs for the endpoint. *Matches any path that doesn’t match the specific endpoints otherwise set. Typically, you set this value to passthroughto pass through to the provider API without LLM-specific processing. -
Create an HTTPRoute resource that routes traffic to the OpenAI Backend along the
/openaipath matcher. Note that because you set up theroutesmap on the Backend, you do not need to create any URLRewrite filters to point your route matcher to the correct LLM provider endpoint.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: openai namespace: kgateway-system spec: parentRefs: - name: agentgateway namespace: kgateway-system rules: - matches: - path: type: PathPrefix value: /openai backendRefs: - name: openai namespace: kgateway-system group: gateway.kgateway.dev kind: Backend EOF -
Send requests to different OpenAI endpoints. With the routes configured, you can access different OpenAI endpoints by including the full path in your requests:
Chat completions:
curl "$INGRESS_GW_ADDRESS:8080/openai/v1/chat/completions" \ -H content-type:application/json \ -d '{ "model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello!"}] }' | jqEmbeddings:
curl "$INGRESS_GW_ADDRESS:8080/openai/v1/embeddings" \ -H content-type:application/json \ -d '{ "model": "text-embedding-ada-002", "input": "The food was delicious" }' | jqModels list:
curl "$INGRESS_GW_ADDRESS:8080/openai/v1/models" | jqChat completions:
curl "localhost:8080/openai/v1/chat/completions" \ -H content-type:application/json \ -d '{ "model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello!"}] }' | jqEmbeddings:
curl "localhost:8080/openai/v1/embeddings" \ -H content-type:application/json \ -d '{ "model": "text-embedding-ada-002", "input": "The food was delicious" }' | jqModels list:
curl "localhost:8080/openai/v1/models" | jq