Vertex AI
Configure Vertex AI as an LLM provider in agentgateway.
Before you begin
Set up an agentgateway proxy.
Set up access to Vertex AI
-
Set up authentication for Vertex AI. Make sure to have your:
- Google Cloud Project ID
- Project location, such as
us-central1 - API key or service account credentials
-
Save your Vertex AI API key as an environment variable.
export VERTEX_AI_API_KEY=<insert your API key> -
Create a Kubernetes secret to store your Vertex AI API key.
kubectl apply -f- <<EOF apiVersion: v1 kind: Secret metadata: name: vertex-ai-secret namespace: agentgateway-system type: Opaque stringData: Authorization: $VERTEX_AI_API_KEY EOF
-
Create an AgentgatewayBackend resource to configure an LLM provider that references the AI API key secret.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: vertex-ai namespace: agentgateway-system spec: ai: provider: vertexai: model: gemini-pro projectId: "my-gcp-project" region: "us-central1" policies: auth: secretRef: name: vertex-ai-secret EOF -
Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend. The following example sets up a route on the
/openaipath to the AgentgatewayBackend that you previously created. TheURLRewritefilter rewrites the path from/openaito the path of the API in the LLM provider that you want to use,/v1/chat/completions.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: vertex-ai namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - matches: - path: type: PathPrefix value: /vertex backendRefs: - name: vertex-ai namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend EOF
-
Send a request to the LLM provider API. Verify that the request succeeds and that you get back a response from the API.
curl "$INGRESS_GW_ADDRESS/vertex" -H content-type:application/json -d '{ "model": "", "messages": [ { "role": "user", "content": "Write me a short poem about Kubernetes and clouds." } ] }' | jqcurl "localhost:8080/vertex" -H content-type:application/json -d '{ "model": "", "messages": [ { "role": "user", "content": "Write me a short poem about Kubernetes and clouds." } ] }' | jqExample output:
{ "id": "chatcmpl-vertex-12345", "object": "chat.completion", "created": 1727967462, "model": "gemini-pro", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "In the cloud, Kubernetes reigns,\nOrchestrating pods with great care,\nContainers float like clouds,\nScaling up and down,\nAutomation everywhere." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 28, "total_tokens": 40 } }
Next steps
- Want to use other endpoints than chat completions, such as embeddings or models? Check out the multiple endpoints guide.
- Explore other guides for LLM consumption, such as function calling, model failover, and prompt guards.