Robot - Rate Limiting

Robots Useragent

Robot - Rate Limiting

About

A page about rate limiting of HTTP request that is implemented to control bot.

A rate limiter caps how many requests a sender (user / IP address ) can issue in a specific window of time (e.g. 25 requests per minute)

Characteristic

  • Storage:
    • Distributed or not (ie shared storage / database or not)
    • Minimum storage
  • Rate limit window calculation:
    • Rolling to enforce the rate smoothly on the whole period
    • Fix where burst of requests can happen at the interval limit
  • Minimum Time Between Request

Algorithm

Token Bucket

Token_bucket 1) to defined limits on bandwidth (Implementation: Php Tocken bucket)

For each new request, the rate limiter:

  • will fetch the following data by user:
    • the last request’s timestamp
    • the available token count left
  • would then first increment the available tokens based on:
    • the refill rate
    • and the time of the user’s last request.
  • would decrement the available tokens

This is a read/write operation that should happen atomically and in isolation if the rate limiter data is shared.

Leaky Bucket

Leaky_bucket 2) is the inverse of token_bucket

The leaky bucket algorithm is widely used to deal with burstiness when bandwidth is limited such as in:

  • in telecommunications
  • and packet‑switched computer networks

The analogy is with a bucket where water:

  • is poured in at the top
  • and leaks from the bottom

If the rate at which water is poured in exceeds the rate at which it leaks, the bucket overflows.

In terms of request processing 3):

  • the water represents requests from clients,
  • the bucket represents a queue where requests wait to be processed according to a first‑in‑first‑out (FIFO) scheduling algorithm.
  • the leaking water represents requests exiting the the queue for processing by the server,
  • the overflow represents requests that are discarded and never serviced.

Fixed window counters

Records the number of requests from a sender occurring in a rate limit’s time interval

Data:

  • user
  • interval id (timestamp)
  • counter

Due to the fix interval, a user could make twice the allowed requests in a short amount of time:

  • the first allowed requests just before the end of an interval
  • the second allowed requests just after the begin of the interval

Sliding window log

A Sliding window log stores all of user’s requests in a single sorted set (deque) ordered by timestamp

For each new request, the rate limiter would:

  • remove expired requests
  • add the current request
  • count the request in the set
  • allow / reject (or wait)

Because every request is recorded, the memory footprint can be huge if you store the data at the lowest grain level

On a day basis, for an application with

  • a rate limit of 100 requests per user
  • 5,000 active users
  • a timestamp stored on 4byte

the total maximum memory could be <MATH> 100 . 5000 . 4 = 2000000 \text{byte} = 19 Mb </MATH>

A solution is to store the data at a highest grain such as the number of request by minute if you want to define your rate limit at the hour.

Configuration

  • Robots - Crawl-delay parameters set the number of seconds that the crawler should wait before another crawl.
  • The og:ttl object property will limit crawler access (ex: Facebook)

Response

Applications should respond to requests from users who surpass the rate limit with a HTTP 429 Too Many Requests response status code. ie the user has sent too many requests in a given amount of time (“rate limiting”).

However when attackers see a 429 response, they simply create new accounts to circumvent the rate limiting. Another response is to implement a shadow ban. (Shadow banned account don't see that their contributions are banned)

4)

Tool

Nginx

The ngx_http_limit_req_module 5) uses the leaky_bucket method.





Discover More
Abuse Detection Github
Security - Abuse Detection

Abuse detection mechanism are generally based on: rate limiting. behavioral analysis or machine learning. ie: scoring every request by how different it is from the baseline. a sort of bot score...
Security - Brut Force Attack

brut force attack is another word for password guessing. This is a brut attack. Strong password Wait time after false attempt
Robots Useragent
Web Robot - Crawler

A web crawler is an crawler application that reads web resources (mostly a web page) and parse them to extract meaningful information. A crawl cycle consists of 4 steps: Selects the urls to fetch...



Share this page:
Follow us:
Task Runner