<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Caching Strategies in Distributed System]]></title><description><![CDATA[Caching Strategies in Distributed System]]></description><link>https://caching-strategies-in-distributed-system.hashnode.dev</link><generator>RSS for Node</generator><lastBuildDate>Tue, 09 Jun 2026 16:48:05 GMT</lastBuildDate><atom:link href="https://caching-strategies-in-distributed-system.hashnode.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Cache Strategies in Distributed Systems]]></title><description><![CDATA[Caching is one of the most powerful tools in system design.
It helps reduce:

database load

response latency

infrastructure cost


However, basic TTL-based caching is not always enough.
In high-traf]]></description><link>https://caching-strategies-in-distributed-system.hashnode.dev/cache-strategies-in-distributed-systems</link><guid isPermaLink="true">https://caching-strategies-in-distributed-system.hashnode.dev/cache-strategies-in-distributed-systems</guid><dc:creator><![CDATA[Rajesh]]></dc:creator><pubDate>Sun, 08 Mar 2026 15:13:41 GMT</pubDate><content:encoded><![CDATA[<p>Caching is one of the most powerful tools in system design.</p>
<p>It helps reduce:</p>
<ul>
<li><p>database load</p>
</li>
<li><p>response latency</p>
</li>
<li><p>infrastructure cost</p>
</li>
</ul>
<p>However, <strong>basic TTL-based caching is not always enough</strong>.</p>
<p>In high-traffic systems, improper caching strategies can actually <strong>cause system failures</strong> due to the <strong>Thundering Herd Problem</strong>.</p>
<p>In this article, we’ll explore <strong>advanced caching techniques used by large-scale systems like Netflix, CDNs, and e-commerce platforms</strong> to prevent traffic spikes and maintain stability.</p>
<hr />
<h1>The Problem: Many Cache Keys Expiring at the Same Time</h1>
<p>Most caching systems use <strong>TTL (Time-To-Live)</strong>.</p>
<p>Example:</p>
<pre><code class="language-plaintext">Product Data Cache TTL = 60 seconds
</code></pre>
<p>For 60 seconds the cache serves requests.</p>
<p>After 60 seconds:</p>
<ul>
<li><p>the cache entry expires</p>
</li>
<li><p>the next request recomputes the data</p>
</li>
</ul>
<p>But what happens if <strong>millions of users request the same data</strong>?</p>
<p>When the TTL expires, <strong>all those requests suddenly hit the backend</strong>.</p>
<p>This creates a <strong>traffic spike</strong>.</p>
<hr />
<h1>Why Basic TTL Caching is Not Enough</h1>
<p>Basic TTL caching works well for <strong>low traffic systems</strong>.</p>
<p>But at scale, it creates a dangerous pattern:</p>
<pre><code class="language-plaintext">Cache valid → everything fast
Cache expires → sudden DB spike
Cache refilled → system stable again
</code></pre>
<p>This pattern repeats every TTL interval.</p>
<p>Example architecture:</p>
<pre><code class="language-plaintext">Users
  |
  v
Application Server
  |
  v
Cache (Redis)
  |
  v
Database
</code></pre>
<p>If cache expires:</p>
<pre><code class="language-plaintext">Thousands of requests
        ↓
Cache MISS
        ↓
All requests go to DB
</code></pre>
<p>Result:</p>
<ul>
<li><p>CPU spikes</p>
</li>
<li><p>DB overload</p>
</li>
<li><p>High latency</p>
</li>
<li><p>Potential system crash</p>
</li>
</ul>
<hr />
<h1>Real World Examples</h1>
<p>Large systems often experience this behavior.</p>
<h3>Netflix Release</h3>
<p>When a new season drops, millions of users request the same content metadata simultaneously.</p>
<hr />
<h3>E-commerce Flash Sales</h3>
<p>During Amazon or Flipkart sales, product pages receive massive traffic.</p>
<hr />
<h3>IPL Streaming</h3>
<p>Cricket streaming platforms experience sudden synchronized traffic spikes when a match begins.</p>
<hr />
<h1>Advanced Strategies to Prevent Cache Stampedes</h1>
<p>Modern distributed systems use several techniques to handle this.</p>
<hr />
<h1>1. TTL Jitter (Randomized Expiration)</h1>
<h3>The Idea</h3>
<p>Instead of setting the same TTL for every cache key, we <strong>add randomness to expiration time</strong>.</p>
<p>Example:</p>
<pre><code class="language-plaintext">TTL = 60 seconds ± random(0–10 seconds)
</code></pre>
<p>Now different cache entries expire at slightly different times.</p>
<hr />
<h3>Without TTL Jitter</h3>
<pre><code class="language-plaintext">Time →
|----------------60s----------------|

Cache expires
ALL requests spike
</code></pre>
<hr />
<h3>With TTL Jitter</h3>
<pre><code class="language-plaintext">Time →
Key A expires at 55s
Key B expires at 61s
Key C expires at 64s
</code></pre>
<p>Requests become <strong>distributed over time</strong>, reducing traffic spikes.</p>
<hr />
<h3>Why It Works</h3>
<p>It prevents <strong>synchronized expiration</strong>, which is the main cause of the thundering herd.</p>
<hr />
<h1>2. Probability-Based Early Expiration</h1>
<p>This is a more advanced strategy used in large-scale systems.</p>
<p>Instead of waiting for the exact TTL to expire, the cache <strong>sometimes refreshes earlier with a small probability</strong>.</p>
<hr />
<h3>Conceptual Idea</h3>
<p>As the cache entry gets closer to expiry:</p>
<ul>
<li>probability of recomputation increases</li>
</ul>
<p>Example:</p>
<pre><code class="language-plaintext">Cache TTL = 60s

At 40s → small chance of refresh
At 50s → higher chance
At 55s → even higher chance
</code></pre>
<p>So instead of <strong>all requests refreshing at 60s</strong>, refreshes are <strong>spread earlier</strong>.</p>
<hr />
<h3>Why This Works</h3>
<p>It smooths the recomputation load across time.</p>
<p>Instead of:</p>
<pre><code class="language-plaintext">1000 requests at 60s
</code></pre>
<p>We get:</p>
<pre><code class="language-plaintext">20 requests at 45s
30 requests at 50s
50 requests at 55s
</code></pre>
<p>Much easier for the system to handle.</p>
<hr />
<h1>3. Mutex / Cache Locking</h1>
<p>This is one of the <strong>most common techniques</strong> used in production.</p>
<hr />
<h3>The Problem</h3>
<p>Without locking:</p>
<pre><code class="language-plaintext">1000 requests
Cache MISS
1000 DB queries
</code></pre>
<hr />
<h3>The Solution</h3>
<p>Allow <strong>only one request</strong> to recompute the cache.</p>
<p>Flow:</p>
<pre><code class="language-plaintext">Request arrives
      ↓
Cache MISS
      ↓
Acquire lock
      ↓
Fetch data from DB
      ↓
Update cache
      ↓
Release lock
</code></pre>
<p>All other requests:</p>
<pre><code class="language-plaintext">wait for cache update
</code></pre>
<hr />
<h3>Result</h3>
<p>Instead of:</p>
<pre><code class="language-plaintext">1000 DB queries
</code></pre>
<p>We get:</p>
<pre><code class="language-plaintext">1 DB query
999 cache hits afterward
</code></pre>
<p>This dramatically reduces backend load.</p>
<hr />
<h1>4. Stale-While-Revalidate (SWR)</h1>
<p>This technique is widely used by <strong>CDNs like Cloudflare and Fastly</strong>.</p>
<hr />
<h3>The Idea</h3>
<p>Even if the cache is technically expired, the system <strong>serves stale data temporarily</strong> while refreshing the cache in the background.</p>
<hr />
<h3>Flow</h3>
<pre><code class="language-plaintext">Request arrives
       ↓
Cache expired
       ↓
Serve stale response
       ↓
Background refresh
</code></pre>
<hr />
<h3>Result</h3>
<p>Users experience:</p>
<ul>
<li><p><strong>low latency</strong></p>
</li>
<li><p><strong>no waiting</strong></p>
</li>
</ul>
<p>While the cache gets updated asynchronously.</p>
<hr />
<h3>Real Example</h3>
<p>CDNs often serve slightly stale content during:</p>
<ul>
<li><p>major sports events</p>
</li>
<li><p>product launches</p>
</li>
<li><p>high traffic spikes</p>
</li>
</ul>
<p>Because <strong>availability is more important than perfect freshness</strong>.</p>
<hr />
<h1>5. Cache Warming (Pre-Warming)</h1>
<p>Cache warming means <strong>preloading cache before traffic arrives</strong>.</p>
<p>Instead of waiting for user requests to populate the cache, the system <strong>loads popular data proactively</strong>.</p>
<hr />
<h3>Example</h3>
<p>Before a Netflix release:</p>
<pre><code class="language-plaintext">Popular shows
Trending metadata
Recommendation data
</code></pre>
<p>are preloaded into cache.</p>
<hr />
<h3>Example: E-commerce Sale</h3>
<p>Before a flash sale starts:</p>
<pre><code class="language-plaintext">Product data
Inventory
Pricing
</code></pre>
<p>are pre-cached.</p>
<hr />
<h3>Benefit</h3>
<p>When traffic spikes:</p>
<pre><code class="language-plaintext">Cache HIT
Cache HIT
Cache HIT
</code></pre>
<p>No database overload occurs.</p>
<hr />
<h1>Tradeoffs in Caching Strategies</h1>
<p>Caching strategies must balance three important factors.</p>
<table>
<thead>
<tr>
<th>Factor</th>
<th>Meaning</th>
</tr>
</thead>
<tbody><tr>
<td>Freshness</td>
<td>How up-to-date the data is</td>
</tr>
<tr>
<td>Latency</td>
<td>How fast responses are</td>
</tr>
<tr>
<td>Consistency</td>
<td>Whether all users see the same data</td>
</tr>
</tbody></table>
<hr />
<h3>Example Tradeoffs</h3>
<p><strong>Stale-While-Revalidate</strong></p>
<ul>
<li><p>Pros → low latency</p>
</li>
<li><p>Cons → slightly stale data</p>
</li>
</ul>
<hr />
<p><strong>Cache Locking</strong></p>
<ul>
<li><p>Pros → protects database</p>
</li>
<li><p>Cons → some requests may wait</p>
</li>
</ul>
<hr />
<p><strong>TTL Jitter</strong></p>
<ul>
<li><p>Pros → simple to implement</p>
</li>
<li><p>Cons → doesn't fully eliminate recomputation spikes</p>
</li>
</ul>
<hr />
<h1>When to Use Which Strategy</h1>
<p>Different scenarios require different techniques.</p>
<table>
<thead>
<tr>
<th>Scenario</th>
<th>Recommended Strategy</th>
</tr>
</thead>
<tbody><tr>
<td>Popular content pages</td>
<td>TTL Jitter + SWR</td>
</tr>
<tr>
<td>Expensive DB queries</td>
<td>Mutex Locking</td>
</tr>
<tr>
<td>High traffic launches</td>
<td>Cache Warming</td>
</tr>
<tr>
<td>Large-scale systems</td>
<td>Probabilistic Expiration</td>
</tr>
</tbody></table>
<p>In practice, <strong>large systems combine multiple strategies</strong>.</p>
<p>Example production setup:</p>
<pre><code class="language-plaintext">TTL Jitter
     +
Mutex Lock
     +
Stale-While-Revalidate
     +
Cache Warming
</code></pre>
<hr />
<h1>Key Takeaways</h1>
<p>Basic TTL caching works for small systems.</p>
<p>But large-scale platforms must handle <strong>traffic spikes and synchronized cache expiry</strong>.</p>
<p>Advanced techniques like:</p>
<ul>
<li><p>TTL Jitter</p>
</li>
<li><p>Probabilistic Early Expiration</p>
</li>
<li><p>Cache Locking</p>
</li>
<li><p>Stale-While-Revalidate</p>
</li>
<li><p>Cache Warming</p>
</li>
</ul>
<p>help prevent the <strong>Thundering Herd Problem</strong> and ensure system stability.</p>
<p>These strategies are commonly used in real-world systems like:</p>
<ul>
<li><p>Netflix releases</p>
</li>
<li><p>IPL streaming platforms</p>
</li>
<li><p>E-commerce flash sales</p>
</li>
</ul>
<p>Understanding these concepts is essential for <strong>designing scalable distributed systems and performing well in system design interviews</strong>.</p>
]]></content:encoded></item></channel></rss>