Understanding the Thundering Herd Problem

When systems suddenly receive a massive burst of identical requests, they can collapse under their own load. This phenomenon is known as the Thundering Herd Problem.
A Simple Real-World Analogy
Imagine a popular store launching a limited-edition product.
Before the store opens, hundreds of people wait outside.
The moment the doors open:
• Everyone rushes in at the same time
• The staff have to handle the sudden crowd
• Checkout counters become overloaded
• The store becomes chaotic
This sudden synchronized rush is exactly what happens in any system and that's when we call it Thundering Herd Problem.
So what is the Thundering Herd Problem?
The Thundering Herd Problem occurs when many clients simultaneously request the same resource, usually after a shared dependency becomes available.
Instead of requests being distributed over time, they arrive all at once, overwhelming the system.
This typically happens when:
• A cache entry expires
• A service restarts
• A lock is released
• A popular event triggers traffic and so on.
Basic Architecture Example
Most modern systems follow this structure:
Normally the flow looks like this:
Client requests data
Application checks cache
If cache hit → return data instantly
If cache miss → fetch from database
Note:- Caches protect databases from heavy traffic.
Where It Commonly Occurs
This problem appears frequently in modern backend systems.
Caching Systems & Databases
When a cached value expires, thousands of clients may request the same data simultaneously.
Load Balancers
A sudden spike of identical requests gets distributed across servers but still overwhelms backend services.
What Happens When Cache TTL Expires
Suppose a cache entry has a TTL (Time To Live) of 5 minutes.
For 5 minutes everything works perfectly.
But when the TTL expires:
Thousands of users request the same data.
The cache no longer has the data.
Every request hits the database simultaneously.
Result:
Clients → App → Cache MISS → Database
Clients → App → Cache MISS → Database
Clients → App → Cache MISS → Database
The database suddenly receives thousands of identical queries. And this spike created db overload.
Normal Traffic Spike vs Thundering Herd
Not all spikes are dangerous.
Normal Traffic Spike
Requests increase gradually.
Example:
100 → 200 → 500 → 800
The system can scale or autoscale.
Thundering Herd
Requests arrive synchronously.
Example:
0 → 5000 requests instantly
No system can handle that efficiently without safeguards.
Why It Becomes Dangerous in Distributed Systems
In distributed environments:
• Multiple services depend on shared caches
• Many instances trigger the same fallback logic
• Failures cascade across services
One expired cache entry can cause:
Cache failure → DB overload → API timeout → Service failures
This is known as a cascading failure.
Impact on System Components
CPU
Servers suddenly process thousands of identical requests.
CPU usage spikes to 100%.
Database
The database receives repeated queries for the same data.
This leads to:
• connection exhaustion
• slow queries
• possible crashes
Cache
The cache becomes ineffective because too many clients try to regenerate the same data simultaneously.
Latency
Response time increases dramatically.
Example:
Normal response:
50 ms
During a thundering herd:
2000 ms+
Real-World Examples
IPL Match Ticket Booking
When ticket sales open, millions of fans refresh the page simultaneously.
The system receives a synchronized burst of traffic.
Netflix New Show Release
When a new season drops, millions of users open the same page at the same time.
This can cause massive cache misses.
Flash Sales (Amazon / Flipkart)
Limited stock + synchronized users = perfect thundering herd conditions.
Techniques to Prevent the Thundering Herd Problem
Good distributed systems implement several mitigation strategies.
1. Request Coalescing
Instead of allowing every request to hit the database, only one request fetches the data.
Other requests wait for the result.
Example:
1000 requests
↓
1 request queries database
↓
999 requests reuse result
This drastically reduces load.
2. Cache Locking (Mutex)
When the cache expires:
First request acquires a lock
It regenerates the cache
Other requests wait
Flow:
Request 1 → lock → DB query
Request 2 → wait
Request 3 → wait
Once the cache is updated, all requests receive the cached value.
3. Staggered Cache Expiry
Instead of giving every cache entry the same TTL, introduce randomness.
Example:
TTL = 300 seconds ± random(30)
This prevents many keys from expiring simultaneously.
4. Exponential Backoff
If a request fails or cannot acquire a lock, it retries after increasing delays.
Example retry pattern:
100 ms
200 ms
400 ms
800 ms
This spreads out traffic.
5. Rate Limiting
Limit the number of requests allowed per second.
Example:
1000 requests/sec allowed
extra requests queued or rejected
This protects backend services.



