Robot - Rate Limiting

About

A page about rate limiting of HTTP request that is implemented to control bot.

A rate limiter caps how many requests a sender (user / IP address ) can issue in a specific window of time (e.g. 25 requests per minute)

Characteristic

Storage:
- Distributed or not (ie shared storage / database or not)
- Minimum storage
Rate limit window calculation:
- Rolling to enforce the rate smoothly on the whole period
- Fix where burst of requests can happen at the interval limit
Minimum Time Between Request

Algorithm

Token Bucket

Token_bucket ¹⁾ to defined limits on bandwidth (Implementation: Php Tocken bucket)

For each new request, the rate limiter:

will fetch the following data by user:
- the last request’s timestamp
- the available token count left
would then first increment the available tokens based on:
- the refill rate
- and the time of the user’s last request.
would decrement the available tokens

This is a read/write operation that should happen atomically and in isolation if the rate limiter data is shared.

Leaky Bucket

Leaky_bucket ²⁾ is the inverse of token_bucket

The leaky bucket algorithm is widely used to deal with burstiness when bandwidth is limited such as in:

in telecommunications
and packet‑switched computer networks

The analogy is with a bucket where water:

is poured in at the top
and leaks from the bottom

If the rate at which water is poured in exceeds the rate at which it leaks, the bucket overflows.

In terms of request processing ³⁾:

the water represents requests from clients,
the bucket represents a queue where requests wait to be processed according to a first‑in‑first‑out (FIFO) scheduling algorithm.
the leaking water represents requests exiting the the queue for processing by the server,
the overflow represents requests that are discarded and never serviced.

Fixed window counters

Records the number of requests from a sender occurring in a rate limit’s time interval

Data:

user
interval id (timestamp)
counter

Due to the fix interval, a user could make twice the allowed requests in a short amount of time:

the first allowed requests just before the end of an interval
the second allowed requests just after the begin of the interval

Sliding window log

A Sliding window log stores all of user’s requests in a single sorted set (deque) ordered by timestamp

For each new request, the rate limiter would:

remove expired requests
add the current request
count the request in the set
allow / reject (or wait)

Because every request is recorded, the memory footprint can be huge if you store the data at the lowest grain level

On a day basis, for an application with

a rate limit of 100 requests per user
5,000 active users
a timestamp stored on 4byte

the total maximum memory could be <MATH> 100 . 5000 . 4 = 2000000 \text{byte} = 19 Mb </MATH>

A solution is to store the data at a highest grain such as the number of request by minute if you want to define your rate limit at the hour.

Configuration

Robots - Crawl-delay parameters set the number of seconds that the crawler should wait before another crawl.
The og:ttl object property will limit crawler access (ex: Facebook)

Response

Applications should respond to requests from users who surpass the rate limit with a HTTP 429 Too Many Requests response status code. ie the user has sent too many requests in a given amount of time (“rate limiting”).

However when attackers see a 429 response, they simply create new accounts to circumvent the rate limiting. Another response is to implement a shadow ban. (Shadow banned account don't see that their contributions are banned)

⁴⁾

Tool

Nginx

The ngx_http_limit_req_module ⁵⁾ uses the leaky_bucket method.

¹⁾

wiki/Token_bucket

²⁾

wiki/Leaky_bucket

³⁾

Leaky bucket Nginx

⁴⁾

https://medium.com/figma-design/an-alternative-approach-to-rate-limiting-f8a06cf7c94c

⁵⁾

https://nginx.org/en/docs/http/ngx_http_limit_req_module.html