Skip to main content

Command Palette

Search for a command to run...

Understanding the Thundering Herd Problem

Updated
5 min read
Understanding the Thundering Herd Problem

When systems suddenly receive a massive burst of identical requests, they can collapse under their own load. This phenomenon is known as the Thundering Herd Problem.


A Simple Real-World Analogy

Imagine a popular store launching a limited-edition product.

Before the store opens, hundreds of people wait outside.

The moment the doors open:

• Everyone rushes in at the same time
• The staff have to handle the sudden crowd
• Checkout counters become overloaded
• The store becomes chaotic

This sudden synchronized rush is exactly what happens in any system and that's when we call it Thundering Herd Problem.

So what is the Thundering Herd Problem?

The Thundering Herd Problem occurs when many clients simultaneously request the same resource, usually after a shared dependency becomes available.

Instead of requests being distributed over time, they arrive all at once, overwhelming the system.

This typically happens when:

• A cache entry expires
• A service restarts
• A lock is released
• A popular event triggers traffic and so on.

Basic Architecture Example

Most modern systems follow this structure:

Normally the flow looks like this:

  1. Client requests data

  2. Application checks cache

  3. If cache hit → return data instantly

  4. If cache miss → fetch from database

Note:- Caches protect databases from heavy traffic.

Where It Commonly Occurs

This problem appears frequently in modern backend systems.

Caching Systems & Databases

When a cached value expires, thousands of clients may request the same data simultaneously.

Load Balancers

A sudden spike of identical requests gets distributed across servers but still overwhelms backend services.

What Happens When Cache TTL Expires

Suppose a cache entry has a TTL (Time To Live) of 5 minutes.

For 5 minutes everything works perfectly.

But when the TTL expires:

  1. Thousands of users request the same data.

  2. The cache no longer has the data.

  3. Every request hits the database simultaneously.

Result:

Clients → App → Cache MISS → Database
Clients → App → Cache MISS → Database
Clients → App → Cache MISS → Database

The database suddenly receives thousands of identical queries. And this spike created db overload.

Normal Traffic Spike vs Thundering Herd

Not all spikes are dangerous.

Normal Traffic Spike

Requests increase gradually.

Example:

100 → 200 → 500 → 800

The system can scale or autoscale.


Thundering Herd

Requests arrive synchronously.

Example:

0 → 5000 requests instantly

No system can handle that efficiently without safeguards.

Why It Becomes Dangerous in Distributed Systems

In distributed environments:

• Multiple services depend on shared caches
• Many instances trigger the same fallback logic
• Failures cascade across services

One expired cache entry can cause:

Cache failure → DB overload → API timeout → Service failures

This is known as a cascading failure.

Impact on System Components

CPU

Servers suddenly process thousands of identical requests.

CPU usage spikes to 100%.


Database

The database receives repeated queries for the same data.

This leads to:

• connection exhaustion
• slow queries
• possible crashes


Cache

The cache becomes ineffective because too many clients try to regenerate the same data simultaneously.


Latency

Response time increases dramatically.

Example:

Normal response:

50 ms

During a thundering herd:

2000 ms+

Real-World Examples

IPL Match Ticket Booking

When ticket sales open, millions of fans refresh the page simultaneously.

The system receives a synchronized burst of traffic.


Netflix New Show Release

When a new season drops, millions of users open the same page at the same time.

This can cause massive cache misses.


Flash Sales (Amazon / Flipkart)

Limited stock + synchronized users = perfect thundering herd conditions.

Techniques to Prevent the Thundering Herd Problem

Good distributed systems implement several mitigation strategies.


1. Request Coalescing

Instead of allowing every request to hit the database, only one request fetches the data.

Other requests wait for the result.

Example:

1000 requests
↓
1 request queries database
↓
999 requests reuse result

This drastically reduces load.


2. Cache Locking (Mutex)

When the cache expires:

  1. First request acquires a lock

  2. It regenerates the cache

  3. Other requests wait

Flow:

Request 1 → lock → DB query
Request 2 → wait
Request 3 → wait

Once the cache is updated, all requests receive the cached value.


3. Staggered Cache Expiry

Instead of giving every cache entry the same TTL, introduce randomness.

Example:

TTL = 300 seconds ± random(30)

This prevents many keys from expiring simultaneously.


4. Exponential Backoff

If a request fails or cannot acquire a lock, it retries after increasing delays.

Example retry pattern:

100 ms
200 ms
400 ms
800 ms

This spreads out traffic.


5. Rate Limiting

Limit the number of requests allowed per second.

Example:

1000 requests/sec allowed
extra requests queued or rejected

This protects backend services.