What is Rate Limiting?

Rate limiting restricts how many requests a user, IP address, API key, or token can make within a defined period. It protects system resources and reduces the impact of brute force attempts, scraping, spam, and misconfigured integrations.

When an API allows 100 requests per minute, the 101st request often receives a 429 Too Many Requests response. Well-designed responses also include headers such as Retry-After so clients know when to try again.

Common Algorithms

Fixed window: Counts within a set window; simple, but can burst at the boundary.
Sliding window: Calculates the time range more smoothly.
Token bucket: Refills tokens at a steady rate and allows controlled short bursts.
Leaky bucket: Processes requests at a fixed rate and smooths sudden load.

Business Use

Rate limiting matters for payment APIs, login forms, search endpoints, file uploads, and public integrations. If limits are too low, legitimate customers are blocked; if they are too high, infrastructure cost and abuse risk rise. Limits should reflect user type, plan, endpoint cost, and security risk.

An API Gateway often applies rate limiting centrally; API design should treat quota, error messages, and client retry behavior together.