Local rate limiting

Local rate limiting

Limit the number of requests that are allowed to enter the cluster before global rate limiting and external auth policies are applied.

About local rate limiting

Local rate limiting is a coarse-grained rate limiting capability that is primarily used as a first line of defense mechanism to limit the number of requests that are forwarded to your rate limit servers.

Without local rate limiting, all requests are directly forwarded to a rate limit server that you set up where the request is either denied or allowed based on the global rate limiting settings that you configured. However, during an attack, too many requests might be forwarded to your rate limit servers and can cause overload or even failure.

To protect your rate limit servers from being overloaded and to optimize their resource utilization, you can set up local rate limiting in conjunction with global rate limiting. Because local rate limiting is enforced in each Envoy instance that makes up your gateway, no rate limit server is required in this setup. For example, if you have 5 Envoy instances that together represent your gateway, each instance is configured with the limit that you set.

For more information about local rate limiting, see the Envoy documentation.

Architecture

The following image shows how local rate limiting works in kgateway. As clients send requests to a backend destination, they first reach the Envoy instance that represents your gateway. Local rate limiting settings are applied to an Envoy pod or process. Note that limits are applied to each pod or process. For example, if you have 5 Envoy instances that are configured with a local rate limit of 10 requests per second, the total number of allowed requests per second is 50 (5 x 10). In a global rate limiting setup, this limit is shared between all Envoy instances, so the total number of allowed requests per second is 10.

Depending on your setup, each Envoy instance or pod is configured with a number of tokens in a token bucket. To allow a request, a token must be available in the bucket so that it can be assigned to a downstream connection. Token buckets are refilled occasionally as defined in the refill setting of the local rate limiting configuration. If no token is available, the connection is closed immediately, and a 429 HTTP response code is returned to the client.

When a token is available in the token bucket it can be assigned to an incoming connection. The request is then forwarded to your rate limit server to enforce any global rate limiting settings. For example, the request might be further rate limited based on headers or query parameters. Only requests that are within the local and global rate limits are forwarded to the backend destination in the cluster.

Local rate limiting

Local rate limiting in kgateway

In kgateway, you use a TrafficPolicy to set up local rate limiting for your routes. You can choose between the following attachment options:

  • A particular route in an HTTPRoute resource: Use the extensionRef filter in the HTTPRoute to attach the TrafficPolicy to the route you want to rate limit. For an example, see Route configuration.
  • All routes in an HTTPRoute: Use the targetRefs section in the TrafficPolicy to attach the policy to a particular HTTPRoute resource.
  • All routes that the Gatewy serves: Use the targetRefs section in the TrafficPolicy to attach the policy to a Gateway. For an example, see Gateway configuration.

Note that if you apply a TrafficPolicy to an HTTPRoute and to a Gateway at the same time, the HTTPRoute policy takes precedence. For more information, see Multiple targetRefs TrafficPolicies.

Before you begin

  1. Follow the Get started guide to install kgateway.

  2. Follow the Sample app guide to create an API gateway proxy with an HTTP listener and deploy the httpbin sample app.

  3. Get the external address of the gateway and save it in an environment variable.

    export INGRESS_GW_ADDRESS=$(kubectl get svc -n kgateway-system http -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
    echo $INGRESS_GW_ADDRESS  
    kubectl port-forward deployment/http -n kgateway-system 8080:8080

Route configuration

Set up local rate limiting for a particular route.

  1. Create a TrafficPolicy with your local rate limiting settings.

    kubectl apply -f- <<EOF
    apiVersion: gateway.kgateway.dev/v1alpha1
    kind: TrafficPolicy
    metadata:
      name: local-ratelimit
      namespace: httpbin
    spec:
      rateLimit:
        local:
          tokenBucket:
            maxTokens: 1
            tokensPerFill: 1
            fillInterval: 100s
    EOF
    Setting Description
    maxTokens The maximum number of tokens that are available to use.
    tokensPerFill The number of tokens that are added during a refill.
    fillIntervall The number of seconds, after which the token bucket is refilled.
  2. Create an HTTPRoute that limits requests to the httpbin app along the ratelimit.example domain. To apply the TrafficPolicy that created earlier, you use the extensionRef filter.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: httpbin-ratelimit
      namespace: httpbin
    spec:
      parentRefs:
      - name: http
        namespace: kgateway-system
      hostnames:
      - ratelimit.example
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /
        filters:
        - type: ExtensionRef
          extensionRef:
            name: local-ratelimit
            group: gateway.kgateway.dev
            kind: TrafficPolicy
        backendRefs:
        - name: httpbin
          port: 8000
    EOF
  3. Send a request to the httpbin app. Verify that you get back a 200 HTTP response code.

    curl -vik http://$INGRESS_GW_ADDRESS:8080/status/200 -H "host: ratelimit.example:8080"
    curl -vik localhost:8080/status/200 -H "host: ratelimit.example"

    Example output:

    * Request completely sent off
    < HTTP/1.1 200 OK
    HTTP/1.1 200 OK
    < access-control-allow-credentials: true
    access-control-allow-credentials: true
    < access-control-allow-origin: *
    access-control-allow-origin: *
    < content-length: 0
    content-length: 0
    < x-envoy-upstream-service-time: 1
    x-envoy-upstream-service-time: 1
    < server: envoy
    server: envoy
  4. Send another request to the httpbin app. Note that this time the request is denied with a 429 HTTP response code and a local_rate_limited message in your CLI output. Because the route is configured with only 1 token that is refilled every 100 seconds, the token was assigned to the connection of the first request. No tokens were available to be assigned to the second request. If you wait for 100 seconds, the token bucket is refilled and a new connection can be accepted by the route.

    curl -vik http://$INGRESS_GW_ADDRESS:8080/status/200 -H "host: ratelimit.example:8080"
    curl -vik localhost:8080/status/200 -H "host: ratelimit.example"

    Example output:

    ...
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 429 Too Many Requests
    HTTP/1.1 429 Too Many Requests
    < x-ratelimit-limit: 1
    x-ratelimit-limit: 1
    < x-ratelimit-remaining: 0
    x-ratelimit-remaining: 0
    < x-ratelimit-reset: 79
    x-ratelimit-reset: 79
    ...
    Connection #0 to host 34.XXX.XX.XXX left intact
    local_rate_limited      
  5. Remove the resources that you created in this guide.

    kubectl delete TrafficPolicy local-ratelimit -n httpbin
    kubectl delete httproute httpbin-ratelimit -n httpbin

Gateway configuration

Instead of applying local rate limiting to a particular route, you can also apply it to an entire gateway. This way, the local rate limiting settings are applied to all the routes that the gateway serves.

  1. Create a TrafficPolicy with your local rate limiting settings. Use the targetRefs section to apply the policy to a specific Gateway. The policy automatically applies to all the routes that the Gateway serves.

    kubectl apply -f- <<EOF
    apiVersion: gateway.kgateway.dev/v1alpha1
    kind: TrafficPolicy
    metadata:
      name: local-ratelimit
      namespace: kgateway-system
    spec:
      targetRefs: 
      - group: gateway.networking.k8s.io
        kind: Gateway
        name: http
      rateLimit:
        local:
          tokenBucket:
            maxTokens: 1
            tokensPerFill: 1
            fillInterval: 100s
    EOF
    Setting Description
    targetRefs Select the Gateway that you want to apply your local rate limiting configuration to. In this example, the policy is applied to all the routes that the http gateway serves.
    maxTokens The maximum number of tokens that are available to use.
    tokensPerFill The number of tokens that are added during a refill.
    fillIntervall The number of seconds, after which the token bucket is refilled.
  2. Send a request to the httpbin app alongside the www.example.com domain that you set up as part of the getting started tutorial. Verify that the request succeeds.

    curl -vik http://$INGRESS_GW_ADDRESS:8080/status/200 -H "host: www.example.com:8080"
    curl -vik localhost:8080/status/200 -H "host: www.example.com"

    Example output:

    * Request completely sent off
    < HTTP/1.1 200 OK
    HTTP/1.1 200 OK
    < access-control-allow-credentials: true
    access-control-allow-credentials: true
    < access-control-allow-origin: *
    access-control-allow-origin: *
    < content-length: 0
    content-length: 0
    < x-envoy-upstream-service-time: 1
    x-envoy-upstream-service-time: 1
    < server: envoy
    server: envoy
  3. Send another request to the httpbin app. Note that this time the request is denied with a 429 HTTP response code and a local_rate_limited message in your CLI output. Because the gateway is configured with only 1 token that is refilled every 100 seconds, the token was assigned to the connection of the first request. No tokens were available to be assigned to the second request. If you wait for 100 seconds, the token bucket is refilled and a new connection can be accepted by the gateway.

    curl -vik http://$INGRESS_GW_ADDRESS:8080/status/200 -H "host: www.example.com:8080"
    curl -vik localhost:8080/status/200 -H "host: www.example.com"

    Example output:

    ...
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 429 Too Many Requests
    HTTP/1.1 429 Too Many Requests
    < x-ratelimit-limit: 1
    x-ratelimit-limit: 1
    < x-ratelimit-remaining: 0
    x-ratelimit-remaining: 0
    < x-ratelimit-reset: 79
    x-ratelimit-reset: 79
    ...
    Connection #0 to host 34.XXX.XX.XXX left intact
    local_rate_limited      
  4. Remove the resources that you created in this guide.

    kubectl delete TrafficPolicy local-ratelimit -n kgateway-system