<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Backend Weekly]]></title><description><![CDATA[Explain complex concepts in Backend Engineering, sharing exclusive backend engineering resources, and helping you become a great Backend Engineer.]]></description><link>https://kaperskyguru.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!tzlx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9b724e-bb27-4b40-967d-6804efa6ea64_392x392.png</url><title>Backend Weekly</title><link>https://kaperskyguru.substack.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 15 Jun 2026 16:44:45 GMT</lastBuildDate><atom:link href="https://kaperskyguru.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Solomon Eseme]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[kaperskyguru@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[kaperskyguru@substack.com]]></itunes:email><itunes:name><![CDATA[Solomon Eseme]]></itunes:name></itunes:owner><itunes:author><![CDATA[Solomon Eseme]]></itunes:author><googleplay:owner><![CDATA[kaperskyguru@substack.com]]></googleplay:owner><googleplay:email><![CDATA[kaperskyguru@substack.com]]></googleplay:email><googleplay:author><![CDATA[Solomon Eseme]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How would you design Memory Systems for production-ready AI Agent.]]></title><description><![CDATA[How to give AI agents the ability to remember things across sessions, across users, across time while also architecting the storage, retrieval, and lifecycle pipeline underneath it.]]></description><link>https://kaperskyguru.substack.com/p/how-would-you-design-memory-systems</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-would-you-design-memory-systems</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 30 May 2026 09:37:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!cuaC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><div class="callout-block" data-callout="true"><p><em>Today's issue is brought to you by <strong><a href="https://masteringbackend.com/">Masteringbackend</a></strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach.</em></p></div><div><hr></div><p>Here's another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering AI backend engineering through real-world systems and interview design questions.</p><div class="callout-block" data-callout="true"><p><strong>Before we dive in:</strong></p><p>If you&#8217;ve asked your AI agents the same question twice, in two different sessions, and it doesn&#8217;t remember the first answer, or&nbsp;doesn&#8217;t remember that it already solved this exact problem for you yesterday.</p><p>That&#8217;s because LLMs are stateless. Every request starts from zero. It lacks Memory. The kind that persists across sessions, accumulates knowledge over time, and makes an agent feel like it knows you.</p><p>Memory is not built in. It&#8217;s <strong>built by backend engineers like yourself.</strong></p><p>If you want to be the engineer who builds these systems, not just the one who uses them, join us on Monday's AMA session to learn more about the <strong>&#8220;Build 10 AI Products in 30 Days&#8221;</strong> Bootcamp.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://luma.com/g3o5min6&quot;,&quot;text&quot;:&quot;Register for the Free Webinar&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://luma.com/g3o5min6"><span>Register for the Free Webinar</span></a></p></div><div><hr></div><p><strong>This is the AI Backend Interview Series on Backend Weekly, which airs every Saturday now.</strong></p><p><em>In this series, I will guide you through answering common AI backend engineering interview questions, covering topics such as AI backend system design, vector databases, memory systems, microservices, API design, and databases.</em></p><p><em>Let's get started with episode 4 (<strong><a href="https://kaperskyguru.substack.com/p/understanding-vector-databases-for">Episode 3 Here</a></strong>):</em></p><div><hr></div><h2><strong>The Interview Scenario</strong></h2><p>You&#8217;re in an AI backend interview.</p><p>They ask:</p><blockquote><p><em><strong>&#8220;How would you design a memory system that allows AI agents to remember context across sessions &#8212; including user preferences, past interactions, and learned behaviors &#8212; at production scale?&#8221;</strong></em></p></blockquote><p>Here&#8217;s how to approach it:</p><div><hr></div><p>When your AI Agent makes an LLM call, it can feel like a blank slate.</p><p>The model receives a prompt, generates a response, and immediately forgets everything. Sometimes, even in the next call, it still won&#8217;t have context or knowledge of the user or previous conversations.</p><p>This is fine for a playground demo. It is not fine for production AI agents, I mean real AI Agents that assist customers, manage workflows, or collaborate with engineering teams across days, weeks, and months.</p><p>Memory is what turns a stateless text generator into an agent that actually knows things.</p><p>And here&#8217;s the part that most people miss: </p><p>Memory is not an AI problem. It is a <strong>backend engineering problem</strong>. <br><br>To explain further. It&#8217;s storage. It&#8217;s retrieval. It&#8217;s lifecycle management. It&#8217;s scoping, access control, and consistency guarantees. That&#8217;s exactly what backend engineers have been doing.</p><p>Let&#8217;s start from the first principle.</p><h2><strong>Why LLMs Don&#8217;t Have Memory</strong></h2><p>Start here with your interviewer. LLMs process a <strong>context window,</strong> which is a fixed-size token buffer that contains everything the model can &#8220;see&#8221; during a single request. <a href="https://codingscape.com/blog/llms-with-largest-context-windows">As of 2026, context windows range from 128K to 1M+ tokens.</a></p><p>A larger context window does not solve the memory problem. It just delays it. You know, like when you compare RAM and SSD (disk)</p><p>Why? </p><p>Let&#8217;s explore the reasons:</p><ul><li><p><strong>Context windows are ephemeral.</strong> The content disappears after the response is generated. The next API call starts empty with nothing persisted.</p></li><li><p><strong>Token cost scales linearly.</strong> Stuffing an entire conversation history into every request is expensive. A full-context approach on a 200K-token window can cost 10&#8211;50&#215; more than a memory-augmented approach that injects only the relevant facts.</p></li><li><p><strong>Recall degrades with length.</strong> <a href="https://medium.com/@aftab001x/the-context-window-wars-how-ai-companies-went-from-8k-to-10-million-tokens-and-why-it-doesnt-a60dac60f082">Research consistently</a> shows that LLMs perform worse at retrieving specific facts from extremely long prompts. More context doesn&#8217;t mean better understanding. It often means more noise.</p></li></ul><p>The solution is not a bigger context window. The solution is a system that <strong>selectively stores, consolidates, and retrieves</strong> the right information at the right time, and injects only what&#8217;s relevant into a manageable prompt. </p><p><strong>That system is called memory</strong>.</p><h2><strong>Types of Memory</strong></h2><p>By the end of 2026, the AI engineering community will have converged on three memory types. </p><p>These mirror human cognitive architecture, and that&#8217;s not a coincidence. If you think about it for a second, they solve the same fundamental problem: </p><p> <strong>What to remember, at what granularity, and for how long.</strong></p><p>Discuss each with your interviewer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jVPb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jVPb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png 424w, https://substackcdn.com/image/fetch/$s_!jVPb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png 848w, https://substackcdn.com/image/fetch/$s_!jVPb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!jVPb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jVPb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png" width="1434" height="1052" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1052,&quot;width&quot;:1434,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:224588,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/199575449?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jVPb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png 424w, https://substackcdn.com/image/fetch/$s_!jVPb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png 848w, https://substackcdn.com/image/fetch/$s_!jVPb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!jVPb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe3753d-d416-4135-8914-8696535b81ee_1434x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most production systems implement <strong>semantic and episodic</strong> memory as a minimum. </p><p>In procedural memory, agents rewrite their own instructions based on experience. It is more advanced and is primarily seen in frameworks like Letta (formerly MemGPT) and LangMem.</p><div class="callout-block" data-callout="true"><p><strong>Tell your interviewer this:</strong> </p><p>The three memory types are not an either/or choice. A production memory system uses all three, scored and blended at retrieval time.</p></div><h2><strong>The Memory Architecture</strong></h2><p>Every memory system, regardless of framework, should follow the same four-stage pipeline. </p><p><strong>This is the architecture you&#8217;ll design as a backend engineer.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cuaC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cuaC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png 424w, https://substackcdn.com/image/fetch/$s_!cuaC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png 848w, https://substackcdn.com/image/fetch/$s_!cuaC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png 1272w, https://substackcdn.com/image/fetch/$s_!cuaC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cuaC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png" width="1434" height="1758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1758,&quot;width&quot;:1434,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:255149,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/199575449?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cuaC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png 424w, https://substackcdn.com/image/fetch/$s_!cuaC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png 848w, https://substackcdn.com/image/fetch/$s_!cuaC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png 1272w, https://substackcdn.com/image/fetch/$s_!cuaC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc66b0498-328f-441d-a10e-2f61cf600304_1434x1758.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let me break each stage down for you.</p><h2><strong>[Stage 1 &#8212; Extract]: Turning Conversations into Facts</strong></h2><p>Raw conversation messages are not in memory. They&#8217;re noise. The extraction stage uses an LLM call to distill a conversation into atomic, structured facts.</p><p>Here&#8217;s what extraction looks like in code:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">import { Memory } from "mem0ai/oss";

const client = new Memory({ apiKey: process.env.MEM0_API_KEY });

// After a conversation, add the messages to memory
await client.add(
  [
    { role: "user",      content: "I work at Acme Corp in the billing team. We use Stripe." },
    { role: "assistant", content: "Got it! I'll keep that in mind for future billing discussions." },
    { role: "user",      content: "Can you always use TypeScript for code examples?" },
    { role: "assistant", content: "Absolutely &#8212; TypeScript it is from now on." },
  ],
  {
    user_id: "user_abc123",   // scoped to this user
    agent_id: "coding-agent", // scoped to this agent
  }
);</code></pre></div><p>Behind the scenes, the extraction pipeline converts those four messages into discrete facts:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">// Extracted memories (stored as separate records):
// 1. "User works at Acme Corp in the billing team"  - semantic
// 2. "User's company uses Stripe for payments"       - semantic
// 3. "User prefers TypeScript for code examples"     - procedural</code></pre></div><p>Each fact is embedded as a vector, tagged with metadata (user ID, session ID, timestamp, memory type), and stored. The raw conversation is not stored, only the distilled facts.</p><h2><strong>[Stage 2 &#8212; Consolidate]: Deduplication and Conflict Resolution</strong></h2><p>This is where most memory systems fail. Without consolidation, memory accumulates contradictions and duplicates over time. </p><p>The user says &#8220;I work at Acme Corp&#8221; in session 1, then &#8220;I just joined Stripe&#8221; in session 30. Both facts exist in the vector store. </p><p>Which one is true? </p><p>Think about it.</p><p>The consolidation stage compares each extracted fact against existing memories and applies one of four operations:</p><ul><li><p><strong>ADD:</strong> If it&#8217;s a new fact, no similar existing memory. Insert it.</p></li><li><p><strong>UPDATE:</strong> If similar memory exists, but the details have changed, then replace it. For example, &#8220;Works at Acme Corp&#8221; becomes &#8220;Works at Stripe.&#8221;</p></li><li><p><strong>DELETE:</strong> New fact explicitly contradicts an old one. Remove the old one.</p></li><li><p><strong>NOOP:</strong> If fact already exists in memory. Skip it.</p></li></ul><p>This is implemented as a tool-calling pattern that the LLM examines the candidate&#8217;s fact alongside similar existing memories and decides which operation to apply.</p><h2><strong>[Stage 3 &#8212; Store]: The Dual-Store Architecture</strong></h2><p>Production memory systems use a <strong>dual-store</strong> architecture. </p><p>A vector store for semantic search and an entity index for structured relationships.</p><p>Let me explain it with an illustration:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">// Vector store: fast semantic retrieval
// "Find memories similar to 'billing API architecture'"
// &#8594; Returns: "User works on billing team", "Uses Stripe", "Prefers microservices"

// Entity index: structured relationship traversal
// "What do I know about entity 'Acme Corp'?"
// &#8594; Returns: "User works there", "Uses Stripe", "Billing team", "500 employees"</code></pre></div><p>Why both? </p><blockquote><p>This is important. Explain it in detail to your interviewer.</p></blockquote><p>Vectors provide semantic flexibility because you can find related memories even when the wording is completely different. Additionally, Entity indexing provides relational integrity, meaning you can traverse relationships between entities without semantic drift.</p><p>For the vector store, you can discuss any technology you&#8217;re already comfortable with that works.</p><ul><li><p>pgvector</p></li><li><p>Qdrant</p></li><li><p>Chroma</p></li><li><p>Pinecone</p></li><li><p>Redis</p></li></ul><p>However, for the entity index, Mem0 now uses a built-in entity collection rather than requiring an external graph database. So that during extraction time, entities are identified and stored in a parallel collection, then matched at retrieval time.</p><h2><strong>[Stage 4 &#8212; Retrieve]: Multi-Signal Scoring</strong></h2><p>Retrieval is where memory becomes useful. When a new user message arrives, the system must decide: </p><blockquote><p><em>Which memories are relevant to this specific request?</em></p></blockquote><p>Modern retrieval runs three scoring passes in parallel:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">async function retrieveMemories(query: string, userId: string) {
  // 1. Semantic similarity &#8212; embed query, find nearest vectors
  const semanticResults = await vectorStore.search({
    embedding: await embed(query),
    filter: { user_id: userId },
    topK: 20,
  });

  // 2. Keyword matching &#8212; exact term overlap for precision
  const keywordResults = await keywordIndex.search({
    query,
    filter: { user_id: userId },
    topK: 20,
  });

  // 3. Entity matching &#8212; extract entities from query, match against entity index
  const entities = extractEntities(query); // ["Stripe", "billing API"]
  const entityResults = await entityIndex.search({
    entities,
    filter: { user_id: userId },
  });

  // 4. Fuse scores: relevance &#215; recency &#215; type_weight
  const fused = fuseScores(semanticResults, keywordResults, entityResults, {
    semanticWeight: 0.6,
    keywordWeight: 0.25,
    entityWeight: 0.15,
    recencyDecay: 0.95, // older memories score slightly lower
  });

  // 5. Return top-5 under 200 token budget
  return selectWithinTokenBudget(fused, { maxTokens: 200 });
}</code></pre></div><p>The fused results are injected into the LLM prompt as a system-level context block. The model sees them as pre-existing knowledge, not as search results. This is what makes the agent feel like it <strong>"remembers."</strong></p><p><strong>Next, let&#8217;s talk about &#8220;Who Remembers What&#8221;, so that you can discuss it with your interviewer.</strong></p><h2><strong>Memory Scoping: Who Remembers What</strong></h2><p>In production, you don&#8217;t have one global memory. You have scoped memories that determine who can see what. </p><blockquote><p>Discuss this with your interviewer because it&#8217;s the access control layer of your memory system.</p></blockquote><ul><li><p><strong>User-scoped:</strong> Memories specific to one user. For example, &#8220;This user prefers dark mode.&#8221; Only retrieved when that user is active. This is the most common scope.</p></li><li><p><strong>Session-scoped:</strong> Memories that expire with the session. Short-term working memory. For example, &#8220;In this session, we&#8217;re refactoring the auth module.&#8221; Cleared on session end.</p></li><li><p><strong>Agent-scoped:</strong> Knowledge the agent has learned across all users. &#8220;This codebase uses Prisma ORM.&#8221; Retrieved regardless of which user is active.</p></li><li><p><strong>Organization-scoped:</strong> Shared memories across a team. &#8220;Acme Corp&#8217;s coding standards require 80% test coverage.&#8221; Retrieved for any user in that org.</p></li></ul><p>Each memory record is tagged with its scope IDs at write time, and filtered by those IDs at read time. </p><p>This is not conceptually different from row-level security in PostgreSQL or tenant isolation in a multi-tenant API. It&#8217;s the same access control pattern for a new data type.</p><p>Let&#8217;s build a simple flow:</p><h2><strong>A Memory-Augmented Agent</strong></h2><p>Here&#8217;s the complete flow starting from user message to memory-augmented response:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;typescript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-typescript">import { MemoryClient } from "mem0ai";
import { OpenAI } from "openai";

const memory = new MemoryClient({ apiKey: process.env.MEM0_API_KEY });
const openai = new OpenAI();

async function chat(userId: string, userMessage: string) {
  // 1. Retrieve relevant memories BEFORE calling the LLM
  const memories = await memory.search(userMessage, { user_id: userId });
  const memoryContext = memories
    .map((m: { memory: string }) =&gt; `- ${m.memory}`)
    .join("\n");

  // 2. Inject memories into the system prompt
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: `You are a helpful assistant. Here's what you remember about this user:\n${memoryContext || "No memories yet."}`,
      },
      { role: "user", content: userMessage },
    ],
  });

  const assistantMessage = response.choices[0].message.content;

  // 3. Store new memories from this interaction (async &#8212; don't block response)
  memory.add(
    [
      { role: "user", content: userMessage },
      { role: "assistant", content: assistantMessage! },
    ],
    { user_id: userId }
  ).catch((err: Error) =&gt; console.error("Memory write failed:", err));

  return assistantMessage;
}</code></pre></div><p>Three things to notice. </p><ol><li><p>Memory retrieval happens before the LLM call, which means retrieved facts are injected into the system prompt, so the model already "knows" them when it generates a response. </p></li><li><p>Next, Memory storage happens after the response, asynchronously. That means you don't block the user's response to write memories.</p></li><li><p>Third, the <code>.add()</code> call handles extraction, consolidation, and storage internally. You send only raw messages; the framework handles the rest.</p></li></ol><h2><strong>Scaling and Production Concerns</strong></h2><p>Here&#8217;s where you impress your interviewer. </p><p>The happy path of memory is easy. Production-grade memory has edge cases that most engineers don&#8217;t think about until they hit them.</p><h3><strong>Token Budget Management</strong></h3><p>The biggest production concern is <strong>how much memory to inject</strong>. </p><p>If it&#8217;s too little, the agent will forget the critical context. And if it&#8217;s too much, your backend system will consume tokens that should be used for the user&#8217;s actual request.</p><p>The standard budget is <strong>200 tokens</strong> for injected memories, which is enough for 5&#8211;8 atomic facts. Your retrieval scoring must rank ruthlessly within that budget.</p><h3><strong>Memory Staleness and Decay</strong></h3><p>In a production system, coming from a business perspective:</p><p>Things change. Facts change. Users switch jobs, change preferences, and abandon projects, customers churn and move to other platforms. </p><p>Your memory systems need a <strong>recency decay strategy</strong>.</p><p>A strategy where newer memories score higher than older ones, or deleting memories of 30 days of user inactivity, etc. </p><p>Some systems implement explicit expiration: session-scoped memories expire automatically; semantic memories older than N days are deprioritized; episodic memories are pruned to the most recent K interactions.</p><h3><strong>Privacy and Data Governance</strong></h3><p>Memory creates a persistent record of user interactions. In regulated industries like healthcare, finance, and legal, this triggers compliance requirements. Your memory system needs:</p><ul><li><p><strong>Delete APIs:</strong> Users must be able to delete their memories (GDPR right to erasure).</p></li><li><p><strong>Scope isolation:</strong> One user&#8217;s memories must never leak into another user&#8217;s retrieval results.</p></li><li><p><strong>Audit trails:</strong> Every memory write, update, and delete must be logged.</p></li><li><p><strong>Encryption at rest:</strong> Memory records contain user data and should be encrypted.</p></li></ul><h3><strong>Observability</strong></h3><p>You can&#8217;t manage what you can&#8217;t see. </p><p>These are some metrics that are worth tracking. You can discuss with the interviewer about their specific use cases:</p><ul><li><p><strong>Memory retrieval latency (p50, p95):</strong> Should be under 100ms. If it&#8217;s higher, your vector index needs tuning, or your token budget filtering is too complex.</p></li><li><p><strong>Memory hit rate:</strong> What percentage of queries return at least one relevant memory? A consistently low hit rate means your extraction pipeline isn&#8217;t capturing useful facts.</p></li><li><p><strong>Token efficiency:</strong> How many tokens do you inject vs how many are used for the user&#8217;s actual request? Track the ratio. Full-context approaches consume 26,000+ tokens per conversation. Memory-augmented approaches average ~7,000 tokens per retrieval, which is a 73% reduction.</p></li><li><p><strong>Consolidation conflict rate:</strong> How often does a new fact UPDATE or DELETE an existing memory? A high conflict rate may indicate noisy extraction or a domain where facts change rapidly.</p></li><li><p><strong>Stale memory retrievals:</strong> How often do retrieved memories turn out to be outdated? Track user corrections (&#8220;Actually, I no longer work at Acme&#8221;) as a signal.</p></li></ul><h2><strong>The Framework Landscape in 2026</strong></h2><p>Let me add this: Discuss these with your interviewer to show awareness of the production landscape</p><p>You don&#8217;t need to build memory from scratch. The ecosystem has matured. </p><ul><li><p><strong>Mem0:</strong> The most widely adopted standalone memory layer. 57K+ GitHub stars. Dual-store architecture (vector + entity index). Supports 20+ vector backends. Works with any LLM stack via REST API. YC-backed, $24M Series A. This is the default recommendation for most teams.</p></li><li><p><strong>Zep:</strong> Production-grade memory with hybrid vector + graph storage. Strong session management. Best for long-running agent sessions where temporal ordering matters deeply.</p></li><li><p><strong>LangMem (LangChain):</strong> Built into the LangChain/LangGraph ecosystem. Supports all three memory types, including procedural. Best when you&#8217;re already committed to LangGraph for agent orchestration.</p></li><li><p><strong>Letta (MemGPT):</strong> Tiered memory with self-editing capabilities. The agent can explicitly choose to write, update, or delete its own memories. Most advanced for autonomous agents with long horizons.</p></li></ul><h2><strong>Final Answer</strong></h2><div class="callout-block" data-callout="true"><p><em>&#8220;I&#8217;d design the memory system as a four-stage pipeline, store in a dual-store architecture combining vector search for semantic retrieval with an entity index for relational traversal, and retrieve using multi-signal scoring that fuses semantic similarity, keyword matching, and entity matching. Memory is scoped by user, session, agent, and organization to enforce isolation. For production hardening, I&#8217;d add recency decay scoring, GDPR-compliant delete APIs, scope-level isolation, and observability on retrieval latency, hit rate, and token efficiency.&#8221;</em></p></div><p>Designing a memory system for AI agents <em>sounds</em> like a problem for ML engineers and AI researchers. But as you dig in, you realize it&#8217;s built on the same foundations you&#8217;ve been working with for years:</p><ul><li><p><strong>Write-ahead pipelines</strong>: Extract, consolidate, store. The same pattern as event sourcing</p></li><li><p><strong>Dual-store retrieval:</strong> Vector + structured index. The same architecture as search systems that combine full-text with filters</p></li><li><p><strong>Scope-based access control</strong>: User, session, org. The same isolation model as multi-tenant APIs</p></li><li><p><strong>Token budget management:</strong> Deciding what to include and what to leave out. The same constraint as API response pagination</p></li><li><p><strong>Lifecycle management:</strong> Expiration, decay, pruning. The same TTL logic as cache eviction</p></li></ul><p>Memory for AI agents is not exotic new technology. </p><p>It is <strong>backend infrastructure</strong> for storage, retrieval, access control, and lifecycle, just applied to a new data type.</p><p>The things an agent has learned.</p><p>So the next time an interviewer asks, <strong>&#8220;How would you give an AI agent memory?&#8221;</strong> don&#8217;t just say &#8220;I&#8217;d use a vector database.&#8221;</p><p>Walk them through the extraction pipeline. Explain how consolidation resolves conflicting facts. Show them the multi-signal scoring function. Talk about what happens when a user exercises their right to be forgotten, and every memory scoped to their ID must be deleted within 30 days.</p><p>That&#8217;s the answer that shows you&#8217;ve built real systems, and not just plugged in a library.</p><div><hr></div><p>I hope you learned something today: <strong>Spread the love.</strong> Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><div class="callout-block" data-callout="true"><p><strong>Whenever you&#8217;re ready</strong></p><p>There are 3 ways I can help you become a great backend engineer:</p><p><strong>1. <a href="https://app.masteringbackend.com/">The MB Platform:</a></strong> Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://masteringbackend.com/academy">The MB Academy:</a></strong> The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://getbackendjobs.com/">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p></div><div><hr></div><p><strong>LAST WORD &#128075;</strong></p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello &#8212; I&#8217;d love to hear from you!</p><p><em><strong>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</strong></em></p>]]></content:encoded></item><item><title><![CDATA[Understanding Vector Databases for Backend Engineers]]></title><description><![CDATA[What vector databases actually are, how they work under the hood, when to use one instead of Postgres, and how to explain it all with confidence in your next backend interview.]]></description><link>https://kaperskyguru.substack.com/p/understanding-vector-databases-for</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/understanding-vector-databases-for</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Thu, 21 May 2026 09:30:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!LufJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><div><hr></div><div class="callout-block" data-callout="true"><p><em>Today's issue is brought to you by <strong><a href="https://masteringbackend.com/">Masteringbackend</a></strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach.</em></p></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Here's another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><div class="callout-block" data-callout="true"><p><strong>Before we dive in:</strong></p><p>Every AI-powered feature you&#8217;ve seen in the last two years &#8212; semantic search, chatbots that answer from your docs, recommendation engines that actually understand context &#8212; has the same thing under the hood: <strong>vector embeddings stored in a database designed to search them.</strong></p><p>If you&#8217;re a backend engineer building anything that touches AI, understanding vector databases is no longer optional. It&#8217;s the data layer you&#8217;ll be asked about in your next interview, and the one your team will ask you to architect next quarter.</p><p>That&#8217;s why we built the <strong>&#8220;Golang30 AI Bootcamp&#8221;</strong> to help you learn how to build 10 production-ready AI projects in Golang. 30 days. Real systems. Real production patterns.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://masteringbackend.com/golang30&quot;,&quot;text&quot;:&quot;Join the AI Engineering Bootcamp &#8594;&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://masteringbackend.com/golang30"><span>Join the AI Engineering Bootcamp &#8594;</span></a></p></div><div><hr></div><p><strong>This is the AI Backend Engineer Series on Backend Weekly. </strong></p><p><em>In this series, I will guide you through understanding and learning AI backend engineering. Let&#8217;s get started with episode 3 of this series. <a href="https://kaperskyguru.substack.com/p/understanding-mcp-for-backend-engineers?r=6bmjcc">Episode 2 here</a></em></p><div><hr></div><div class="callout-block" data-callout="true"><p>You&#8217;re in an AI Engineer interview.</p><p>They ask:</p><p><em><strong>&#8220;What is a vector database, how does it work under the hood, and when would you use one instead of a traditional database in a backend AI system?&#8221;</strong></em></p><p>Here&#8217;s how to approach it:</p></div><p>Every database you&#8217;ve ever used answers the same kind of question: &#8220;Give me the row where <code>id = 42</code>.&#8221; Or &#8220;Give me all rows where <code>status = 'active'</code> and <code>created_at &gt; '2025-01-01'</code>.&#8221;</p><p>These are <strong>exact-match</strong> queries. The database compares values. Either a row matches, or it doesn&#8217;t. This works brilliantly for structured data, and it&#8217;s the foundation of every production system you&#8217;ve built.</p><p>But what if the question isn&#8217;t exact?</p><p>&#8220;<strong>Find me documents that are </strong><em><strong>about</strong></em><strong> contract disputes, even if the word &#8216;contract&#8217; never appears.&#8221; Or &#8220;Find me products </strong><em><strong>similar to</strong></em><strong> this one.</strong>&#8221; Or &#8220;<strong>Find the 10 most relevant knowledge base articles for this customer question.</strong>&#8221;</p><p>Traditional databases can&#8217;t answer these questions. They don&#8217;t understand the meaning because they compare bytes.</p><p>Vector databases can. </p><p>Therefore, as a backend engineer building AI-powered features in 2026, you need to understand how.</p><h2><strong>What Is a Vector Database?</strong></h2><p>Start here with your interviewer:</p><div class="callout-block" data-callout="true"><p>A vector database is a database system purposely built for storing, indexing, and searching <strong>high-dimensional vectors</strong>, also called <strong>embeddings</strong>.</p></div><p>An embedding is a numerical representation of a piece of data, a sentence, an image, a product, or a user profile, produced by a machine learning model. The model converts the <em>meaning</em> of the data into a list of numbers, typically 768 to 1536 dimensions.</p><p>The key property is that <strong>things that are semantically similar end up close together in vector space.</strong> </p><p>The sentences &#8220;<strong>My order hasn&#8217;t arrived</strong>&#8221; and &#8220;<strong>Where is my package?</strong>&#8221; produce embeddings that are close to each other, even though they share zero words.</p><p>A vector database does one thing exceptionally well. It is designed to efficiently search through embeddings.</p><p>Given a query vector, it finds the <strong>K most similar vectors</strong> in the database. This process is called <strong>similarity search</strong>, and it powers semantic search, RAG pipelines, recommendation systems, anomaly detection, and multi-modal search across text, images, and audio.</p><div class="callout-block" data-callout="true"><p><strong>Discuss this with your interviewer:</strong> </p><p>The core difference between a traditional database and a vector database is the <em>type of question</em> it answers. A traditional database answers "find the exact match." A vector database answers "find the closest meaning."</p></div><h2><strong>How Vector Search Works Under the Hood</strong></h2><p>When a query arrives at a vector database, this is what happens:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LufJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LufJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png 424w, https://substackcdn.com/image/fetch/$s_!LufJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png 848w, https://substackcdn.com/image/fetch/$s_!LufJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png 1272w, https://substackcdn.com/image/fetch/$s_!LufJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LufJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png" width="1414" height="918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:918,&quot;width&quot;:1414,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:119478,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/198667382?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LufJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png 424w, https://substackcdn.com/image/fetch/$s_!LufJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png 848w, https://substackcdn.com/image/fetch/$s_!LufJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png 1272w, https://substackcdn.com/image/fetch/$s_!LufJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06b9f90c-743f-45d9-8012-e8280c064f6f_1414x918.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The critical step is the <strong>index lookup</strong>.</p><p>Without an index, finding the most similar vectors requires comparing the query against every single vector in the database. That&#8217;s a brute-force scan <strong>[</strong><code>O(n)]</code> and it&#8217;s prohibitively slow once you have more than a few thousand vectors.</p><p>This is where <strong>Approximate Nearest Neighbor (ANN)</strong> algorithms come in. They trade a tiny amount of accuracy for massive speed improvements, finding results that are &#8220;close enough&#8221; to the true nearest neighbors in a fraction of the time.</p><h2><strong>The Indexing Algorithms You Need to Know</strong></h2><p>Three indexing algorithms come up in interviews. You don&#8217;t need to implement them from scratch, but you need to understand what they do, when to use them, and their trade-offs.</p><h3><strong>HNSW (Hierarchical Navigable Small World)</strong></h3><p>The most widely used ANN index in production today. HNSW builds a multi-layered graph where each node is a vector and edges connect similar vectors. The top layers have long-range connections (for fast global navigation), and the bottom layers have short-range connections (for precise local search).</p><p>Think of it like a skip list, but in high-dimensional space. You start at the top layer, make big jumps to get close to the target region, then descend layer by layer to find the exact nearest neighbors.</p><ul><li><p><strong>Speed:</strong> Excellent query performance. Sub-millisecond at millions of vectors.</p></li><li><p><strong>Recall:</strong> Very high, typically 95&#8211;99%+ with proper tuning.</p></li><li><p><strong>Trade-off:</strong> High memory usage. The entire graph lives in RAM. Slower build times than IVF.</p></li><li><p><strong>Best for:</strong> Most production workloads under 50M vectors where query speed matters most.</p></li></ul><h3><strong>IVFFlat (Inverted File Index)</strong></h3><p>IVF works by clustering your vectors into groups using k-means, then only searching the clusters closest to the query vector. Instead of scanning every vector, it scans only the relevant clusters, dramatically reducing the search space.</p><ul><li><p><strong>Speed:</strong> Good, but slower than HNSW for most workloads.</p></li><li><p><strong>Recall:</strong> Depends heavily on the number of clusters scanned. More clusters = higher recall = slower search.</p></li><li><p><strong>Trade-off:</strong> Requires a training step &#8212; you need existing data before building the index. Not great for tables that start empty.</p></li><li><p><strong>It is best for: </strong>Large datasets where memory is constrained. Often combined with Product Quantization (PQ) for compression.</p></li></ul><h3><strong>Flat (Brute Force)</strong></h3><p>This algorithm needs no index at all. It compares the query against every single vector with 100% recall, but <code>O(n)</code> scan time.</p><p>The flat (Brute force) strategy is best for small datasets (under 10K vectors), benchmarking recall of other indexes, or when perfect accuracy is required.</p><div class="callout-block" data-callout="true"><p><strong>Tell your interviewer this:</strong> </p><p>HNSW is the default choice for most production systems. Use IVF when you have billions of vectors, and memory is the constraint. Use flat only when the dataset is small enough that brute-force is fast.</p></div><h2><strong>Distance Metrics: How &#8220;Similar&#8221; Is Defined</strong></h2><p>The database needs a function to measure how close two vectors are. Three distance metrics dominate:</p><ul><li><p><strong>Cosine Similarity:</strong> Measures the angle between two vectors. Ignores magnitude, focuses on direction. This is the default for most text embedding models like OpenAI, Cohere, and Sentence Transformers, all of which normalize their outputs. Use this unless you have a specific reason not to.</p></li><li><p><strong>L2 (Euclidean) Distance:</strong> Measures the straight-line distance between two points. Considers both direction and magnitude. Better when magnitude carries meaning, like user activity intensity.</p></li><li><p><strong>Dot Product (Inner Product):</strong> A fast alternative to cosine when vectors are already normalized. Often used by recommendation systems.</p></li></ul><p>When building the index, you must specify which distance metric to use, and your queries must use the same one. Mixing them is a silent correctness bug that&#8217;s hard to catch.</p><h2><strong>Vector Search with pgvector</strong></h2><p>Here&#8217;s where it gets practical.</p><p>For most backend teams, the right starting point is not a dedicated vector database, but it&#8217;s <strong>pgvector</strong>, the PostgreSQL extension that adds vector columns, distance operators, and ANN indexing to the database you&#8217;re already running.</p><p>Here&#8217;s why?</p><p>Your documents and embeddings live in the same table, in the same transaction, with no sync pipeline or extra credentials. Additionally, your team has no new service to monitor. For workloads under 5 million vectors, pgvector&#8217;s performance is more than adequate.</p><h3><strong>Setting Up pgvector</strong></h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">-- Enable the extension
CREATE EXTENSION vector;

-- Create a table with a vector column
CREATE TABLE documents (
  id       SERIAL PRIMARY KEY,
  title    TEXT NOT NULL,
  content  TEXT NOT NULL,
  metadata JSONB DEFAULT '{}',
  embedding VECTOR(1536)  -- OpenAI ada-002 outputs 1536 dims
);

-- Create an HNSW index for cosine similarity
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- m = max connections per node (higher = more accurate, more memory)
-- ef_construction = build-time search depth (higher = better index, slower build)</code></pre></div><h3><strong>Inserting Embeddings</strong></h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">import { OpenAI } from "openai";
import { Pool } from "pg";

const openai = new OpenAI();
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

async function insertDocument(title: string, content: string) {
  // 1. Generate the embedding from the content
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: content,
  });

  const embedding = response.data[0].embedding; // float[] of 1536 dims

  // 2. Store both the content AND the embedding in the same row
  await pool.query(
    `INSERT INTO documents (title, content, embedding)
     VALUES ($1, $2, $3)`,
    [title, content, JSON.stringify(embedding)]
  );
}</code></pre></div><h3><strong>Querying: Semantic Search</strong></h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">async function semanticSearch(query: string, limit: number = 5) {
  // 1. Embed the query with the SAME model used for documents
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: query,
  });

  const queryEmbedding = response.data[0].embedding;

  // 2. Find the closest vectors using cosine distance (&lt;=&gt;) 
  const result = await pool.query(
    `SELECT id, title, content,
            1 - (embedding &lt;=&gt; $1::vector) AS similarity
     FROM documents
     ORDER BY embedding &lt;=&gt; $1::vector
     LIMIT $2`,
    [JSON.stringify(queryEmbedding), limit]
  );

  return result.rows;
  // Returns: [{ id, title, content, similarity: 0.92 }, ...]
}</code></pre></div><p>Notice the <code>&lt;=&gt;</code> operator. </p><p>That&#8217;s pgvector&#8217;s cosine distance operator. The HNSW index kicks in automatically. PostgreSQL&#8217;s query planner knows to use it. You get ANN search through standard SQL, in the same transaction as your regular queries.</p><h3><strong>Hybrid Search: Combining Vector and Metadata Filters</strong></h3><p>In production, pure vector search is rarely enough. You almost always need to combine semantic similarity with traditional filters, by user, by date, by category, by tenant. Discuss this with your interviewer.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">// Find support articles similar to a question, but only for a specific product
const result = await pool.query(
  `SELECT id, title, content,
          1 - (embedding &lt;=&gt; $1::vector) AS similarity
   FROM documents
   WHERE metadata-&gt;&gt;'product' = $2
     AND metadata-&gt;&gt;'status' = 'published'
   ORDER BY embedding &lt;=&gt; $1::vector
   LIMIT 10`,
  [JSON.stringify(queryEmbedding), "billing-api"]
);</code></pre></div><p>This is one of pgvector's biggest advantages: the WHERE clause and the vector search happen in the same query, in the same transaction, against the same table. No sync pipeline between a metadata store and a vector store.</p><h2><strong>pgvector vs Dedicated Vector Databases</strong></h2><p>This is where many candidates stumble. They either always say &#8220;use Pinecone&#8221; or always say &#8220;use Postgres.&#8221; The right answer depends on the workload. Walk your interviewer through the decision:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kN0X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kN0X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png 424w, https://substackcdn.com/image/fetch/$s_!kN0X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png 848w, https://substackcdn.com/image/fetch/$s_!kN0X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png 1272w, https://substackcdn.com/image/fetch/$s_!kN0X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kN0X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png" width="1430" height="970" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:970,&quot;width&quot;:1430,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207848,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/198667382?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kN0X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png 424w, https://substackcdn.com/image/fetch/$s_!kN0X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png 848w, https://substackcdn.com/image/fetch/$s_!kN0X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png 1272w, https://substackcdn.com/image/fetch/$s_!kN0X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F814ed6ca-d249-462b-af5c-ff99e146c7b1_1430x970.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="callout-block" data-callout="true"><p><strong>Rule of thumb:</strong> </p><p>Start with pgvector. It handles more than most teams realize. Move to a dedicated vector database when you hit a scaling ceiling that Postgres can't solve, hundreds of millions of vectors, auto-scaling requirements, or multi-tenant isolation at the vector level.</p></div><h2><strong>Real-World Use Cases for Backend Engineers</strong></h2><p>Don&#8217;t just list use cases, explain them in terms your interviewer will recognize as production systems:</p><ul><li><p><strong>RAG (Retrieval-Augmented Generation):</strong> The most common use case. You embed your knowledge base, store the vectors, and when a user asks a question, you embed the question, find the K most relevant documents via vector search, and feed those documents to the LLM as context. The LLM answers from your data, not from its training set. This is how every &#8220;chat with your docs&#8221; feature works.</p></li><li><p><strong>Semantic search:</strong> Traditional search requires keyword matching. With vector search, a query for &#8220;how to cancel my subscription&#8221; will surface articles about &#8220;membership termination&#8221; and &#8220;account deactivation&#8221;, because the embeddings are close together in meaning.</p></li><li><p><strong>Recommendation engines:</strong> Embed your products, your users, and their interactions. Find products whose embeddings are closest to what a user has engaged with. Shopify uses a hybrid vector + keyword search for product discovery at scale.</p></li><li><p><strong>Anomaly detection:</strong> Embed your normal system behavior. When a new data point is far from all known embeddings, flag it as anomalous. Used in fraud detection and security monitoring.</p></li><li><p><strong>Deduplication:</strong> Finding near-duplicate content,  support tickets, product listings, and user-generated posts that wouldn&#8217;t be caught by exact-match comparison.</p></li></ul><h2><strong>Observability and Production Concerns</strong></h2><p>You can&#8217;t manage what you can&#8217;t see. Discuss these with your interviewer:</p><h3><strong>Metrics to Track</strong></h3><ul><li><p><strong>Query latency (p50, p95, p99):</strong> Vector search should complete in 10&#8211;50ms. If it&#8217;s above 200ms, your index parameters need tuning, or your dataset has outgrown a single instance.</p></li><li><p><strong>Recall accuracy:</strong> Periodically run ground-truth comparisons, flat (brute-force) search vs your ANN index, to verify recall hasn&#8217;t degraded. Target 95%+ for most applications.</p></li><li><p><strong>Embedding generation latency:</strong> The API call to your embedding model (OpenAI, Cohere) is often the bottleneck, not the vector search itself. Track this separately.</p></li><li><p><strong>Index build time:</strong> HNSW indexes can take minutes to hours on large datasets. Track rebuild durations and plan around them.</p></li><li><p><strong>Memory usage:</strong> HNSW indexes live in RAM. Monitor memory consumption as your vector count grows. A 1536-dimension vector at 32-bit precision is ~6KB &#8212; 1 million vectors = ~6GB of index memory before overhead.</p></li></ul><h3><strong>Common Production Pitfalls</strong></h3><ul><li><p><strong>Embedding model mismatch:</strong> If you embed your documents, <code>text-embedding-3-small</code> but query with <code>text-embedding-ada-002</code>. The results will be garbage. Always use the same model for both insertion and query. Store the model name as metadata.</p></li><li><p><strong>Stale embeddings:</strong> When your source content changes, the embedding doesn&#8217;t automatically update. Build a re-embedding pipeline triggered by content updates.</p></li><li><p><strong>Missing distance metric alignment:</strong> If you build your HNSW index with <code>vector_cosine_ops</code> , but query using <code>vector_l2_ops</code>. The index won&#8217;t be used. PostgreSQL will fall back to a sequential scan. Always match the index and query operators.</p></li></ul><div class="callout-block" data-callout="true"><p><strong>Final Answer</strong></p><p><em>&#8220;A vector database is purpose-built for storing and searching high-dimensional embeddings by semantic similarity, rather than exact match. Under the hood, it uses ANN algorithms, primarily HNSW, to find the K nearest neighbors in sub-linear time. For most backend teams, I&#8217;d start with pgvector: it adds vector columns, cosine distance operators, and HNSW indexing directly to PostgreSQL, so embeddings and metadata live in the same table, same transaction, same query. This avoids the sync pipeline and operational overhead of a separate vector store. I&#8217;d move to a dedicated solution like Pinecone or Weaviate only when the workload exceeds what a single Postgres instance can handle, hundreds of millions of vectors, or high concurrent search throughput. The key production concerns are embedding model consistency, index memory budgeting, and recall monitoring.&#8221;</em></p></div><p>Understanding vector databases sounds like an AI-specific topic. Something for ML engineers, not backend engineers. But as you dig in, you realize the core challenges are deeply familiar:</p><ul><li><p>Choosing the right index type for a workload is the same trade-off you&#8217;ve made with B-trees vs hash indexes vs GIN</p></li><li><p>Memory budgeting is the same capacity planning you do for any in-memory data structure</p></li><li><p>Consistency guarantees between data and its derived representations are the same challenge as materialized views or search indexes</p></li><li><p>Hybrid querying is combining new query patterns with existing relational data in the same transaction</p></li><li><p>Observability, latency, recall, and throughput are the same metrics discipline you apply to any production system</p></li></ul><p>Vector databases are not some exotic new technology that requires you to unlearn everything you know. They are an extension of the same storage engineering principles you&#8217;ve been applying for years, applied to a new data type and a new class of query.</p><p>So the next time an interviewer asks, <strong>&#8220;What is a vector database and when would you use one?&#8221;</strong> don&#8217;t just say &#8220;it&#8217;s a database for AI embeddings.&#8221;</p><p>Walk them through how HNSW builds its multi-layered graph. Explain why cosine similarity is the default for text embeddings. Show them the pgvector query that combines a WHERE clause with a vector search in a single SQL statement. Tell them exactly when you&#8217;d outgrow Postgres and why.</p><p>That&#8217;s the answer that shows you understand the engineering underneath and not just the buzzword on top.</p><div><hr></div><p>I hope you learned something today: <strong>Spread the love.</strong> Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p><strong>Remember to start learning backend engineering from our courses:</strong></p><p>Get a <strong>50% discount</strong> on any of these courses. Reach out to me (Reply to this mail)</p><ol><li><p><strong><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></strong></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><strong><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/books">All Backend Books</a></strong></p></li><li><p><strong><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/community">Join our Community</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></strong></p></li></ol><div><hr></div><div class="callout-block" data-callout="true"><p><strong>Whenever you&#8217;re ready</strong></p><p>There are 3 ways I can help you become a great backend engineer:</p><p><strong>1. <a href="https://app.masteringbackend.com/">The MB Platform:</a></strong> Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://masteringbackend.com/academy">The MB Academy:</a></strong> The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://getbackendjobs.com/">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p></div><div><hr></div><p><strong>LAST WORD &#128075;</strong></p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello &#8212; I&#8217;d love to hear from you!</p><p><em><strong>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</strong></em></p>]]></content:encoded></item><item><title><![CDATA[Understanding MCP for Backend Engineers]]></title><description><![CDATA[What the Model Context Protocol actually is, how it works under the hood, and how to build an MCP server that exposes your backend services to AI agents, explained the way backend engineers actually]]></description><link>https://kaperskyguru.substack.com/p/understanding-mcp-for-backend-engineers</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/understanding-mcp-for-backend-engineers</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 25 Apr 2026 15:49:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YP2_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><div><hr></div><div class="callout-block" data-callout="true"><p><em>Today's issue is brought to you by <strong><a href="https://masteringbackend.com/">Masteringbackend</a></strong> &#8594; An all-in-one platform that helps backend engineers become highly paid backend and AI engineers by leveraging a practical learning approach.</em></p></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Here's another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><h4><strong>Before we dive in:</strong></h4><div class="callout-block" data-callout="true"><h5><strong>Give Your AI Agent Eyes on the Web</strong></h5><p>Here&#8217;s something nobody talks about when they recommend MCP servers:</p><p>3 MCP servers consumed 143,000 of 200,000 tokens before an agent read its first message. That&#8217;s 72% of your context window gone &#8212; on tool schemas the agent never even touched.</p><p>There&#8217;s a simpler architecture.</p><p><strong>Bright Data CLI</strong> gives coding agents like Claude Code, Cursor, and Copilot direct access to real-time web data &#8212; straight from the terminal. No server setup. No schema bloat. No OAuth flow. Just one command:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;markdown&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-markdown">brightdata scrape https://any-website.com &#8594; structured JSON</code></pre></div><p>Scrape any URL with automatic CAPTCHA bypass. Search Google, Bing, and Yandex. Extract structured data from 40+ platforms &#8212; Amazon, LinkedIn, Instagram, TikTok, YouTube, Reddit, and more.</p><p>One install. Works with 46+ AI agents. 10&#8211;32x cheaper than MCP for the same tasks.</p><p>It&#8217;s open source. Go check it out &#8212; and drop a star while you&#8217;re there. &#11088;</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://github.com/luminati-io/brightdata-cli&quot;,&quot;text&quot;:&quot;&#8594; Star on GitHub&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://github.com/luminati-io/brightdata-cli"><span>&#8594; Star on GitHub</span></a></p></div><div><hr></div><p><strong>This is the AI Backend Engineer Series on Backend Weekly. </strong><em>In this series, I will guide you through understanding and learning AI backend engineering. </em></p><p><em>Let's get started with episode 2 of this series. <a href="https://kaperskyguru.substack.com/p/how-to-debug-ai-backend-systems">Episode 1 here</a></em></p><div><hr></div><h2><strong>What is MCP?</strong></h2><p>For most of software engineering history, APIs had one type of consumer: code written by humans, for humans to trigger. A mobile app, a browser, a cron job &#8212; all driven by human intent at some level.</p><p>That assumption is broken.</p><p>AI agents are now consumers of your backend. They discover capabilities, make decisions, call your services, and act on the results &#8212; autonomously, at scale, and without a human writing the integration code on each side.</p><p>The Model Context Protocol &#8212; MCP &#8212; is the open standard that makes this work reliably. Anthropic released it in November 2024. By March 2025, OpenAI had adopted it. Google DeepMind followed. In December 2025, the protocol moved to the Linux Foundation, co-owned by Anthropic, OpenAI, and Block. When three competing AI labs agree on a standard, that standard tends to stick.</p><p>Every backend engineer building AI-integrated systems needs to understand MCP. </p><p>Let&#8217;s break it down.</p><h2><strong>Understand the Problem MCP Solves</strong></h2><p>Before MCP, connecting an AI model to an external system meant custom integration code for every single pair of (model, tool). </p><p>If you want Claude to query your PostgreSQL database. You must write an integration. </p><p>If you want ChatGPT to do the same. You will write a different integration. Add three more AI models, add three more integrations per tool.</p><p>This is what Anthropic called the <strong>N&#215;M integration problem</strong>. </p><p>With N AI models and M external tools, you end up writing N&#215;M custom connectors &#8212; each with its own authentication handling, schema definitions, error formats, and maintenance burden.</p><p>MCP solves it by collapsing N&#215;M to N+M. You write one MCP server per tool, and any MCP-compatible AI host can use it without modification.</p><div class="callout-block" data-callout="true"><p><strong>The analogy that sticks:</strong> MCP is to AI agents what the USB standard is to hardware peripherals. Before USB, every device needed a proprietary port. After USB, any device works with any host. MCP does the same for AI models and backend services.</p></div><p>It's also worth understanding what MCP is <em>not</em>. </p><p>MCP is not an agent framework. It does not decide <em>when</em> to call a tool or <em>why</em>. It does not replace orchestration layers like LangChain or LangGraph. MCP is a standardized <strong>integration layer</strong> &#8212; the protocol that sits between an AI model and the external world, defining how capabilities are discovered and invoked.</p><h2><strong>The Architecture: Hosts, Clients, and Servers</strong></h2><p>MCP defines three roles. Understanding them clearly is the key to the entire protocol. Discuss each one with your interviewer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YP2_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YP2_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png 424w, https://substackcdn.com/image/fetch/$s_!YP2_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png 848w, https://substackcdn.com/image/fetch/$s_!YP2_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!YP2_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YP2_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png" width="1380" height="1112" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1112,&quot;width&quot;:1380,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:228114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/194881760?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9698f9fd-8225-4a99-add7-9342a6bea78d_1380x1112.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YP2_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png 424w, https://substackcdn.com/image/fetch/$s_!YP2_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png 848w, https://substackcdn.com/image/fetch/$s_!YP2_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!YP2_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806ed7c5-5af2-4912-b944-3a18737b768b_1380x1112.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The key insight is the <strong>1:1 relationship between client and server</strong>. A single host can contain many clients, each connected to a different MCP server. Your agent can simultaneously be connected to a GitHub server, a PostgreSQL server, your internal API server, and a Slack server &#8212; and the host coordinates all of them.</p><h2><strong>The Three Primitives: What Your Server Can Expose</strong></h2><p>An MCP server exposes its capabilities through exactly three primitives. Everything in the protocol is built around these three concepts. Get them right, and the rest falls into place.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l0Kg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l0Kg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png 424w, https://substackcdn.com/image/fetch/$s_!l0Kg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png 848w, https://substackcdn.com/image/fetch/$s_!l0Kg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png 1272w, https://substackcdn.com/image/fetch/$s_!l0Kg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l0Kg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png" width="1402" height="756" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:756,&quot;width&quot;:1402,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:173177,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/194881760?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l0Kg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png 424w, https://substackcdn.com/image/fetch/$s_!l0Kg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png 848w, https://substackcdn.com/image/fetch/$s_!l0Kg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png 1272w, https://substackcdn.com/image/fetch/$s_!l0Kg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea2d8e90-8291-4d3e-8b8c-cb954f7177e4_1402x756.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most backend engineers building MCP servers will spend 90% of their time on Tools. That's where the action is. Resources are useful for giving the model a structured context. Prompts are useful when you want to guide how the model interacts with your domain.</p><h2><strong>Building Your First MCP Server</strong></h2><p>Let&#8217;s build a real one. We&#8217;ll build an MCP server that exposes a user service &#8212; something every backend engineer has shipped before. The server will expose the ability to look up a user and create a new one.</p><p>First, install the official TypeScript SDK and Zod for schema validation:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">npm install @modelcontextprotocol/sdk zod
npm install -D typescript @types/node tsx</code></pre></div><h3><strong>Setting Up the Server</strong></h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

// McpServer is the high-level API &#8212; handles capability negotiation,
// request routing, and input validation automatically.
const server = new McpServer({
  name: "user-service",
  version: "1.0.0",
});</code></pre></div><h3><strong>Registering a Tool</strong></h3><p>Here&#8217;s the most important primitive. A tool has a name, a description the LLM reads to decide when to use it, a typed input schema using Zod, and a handler that does the actual work:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">server.registerTool(
  "get_user",
  {
    description: "Look up a user by their ID. Returns name, email, and account status.",
    inputSchema: {
      userId: z.string().describe("The unique identifier of the user"),
    },
  },
  async ({ userId }) =&gt; {
    try {
      // Your real database call goes here
      const user = await db.users.findUnique({ where: { id: userId } });

      if (!user) {
        return {
          content: [{ type: "text", text: `No user found with ID: ${userId}` }],
          isError: true,
        };
      }

      return {
        content: [{
          type: "text",
          text: JSON.stringify({
            id: user.id,
            name: user.name,
            email: user.email,
            status: user.status,
          }, null, 2),
        }],
      };
    } catch (err) {
      return {
        content: [{ type: "text", text: "Failed to retrieve user. Please try again." }],
        isError: true,
      };
    }
  }
);</code></pre></div><p>Notice two things. First, the description matters enormously &#8212; the LLM reads it to decide whether to call this tool at all. Write it like you're documenting an API for a very smart but very literal colleague. Second, always handle errors explicitly and return <code>isError: true</code> rather than throwing. MCP tools should degrade gracefully.</p><h3><strong>Registering a Tool with Annotations</strong></h3><p>For tools that have side effects, use annotations to signal behavior to the host. This is how you tell the host &#8212; and the LLM &#8212; what kind of action this tool performs:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">server.registerTool(
  "create_user",
  {
    description: "Create a new user account. Sends a verification email on success.",
    inputSchema: {
      name:  z.string().describe("Full name of the user"),
      email: z.string().email().describe("Email address &#8212; must be unique"),
      role:  z.enum(["admin", "member", "viewer"]).default("member"),
    },
    annotations: {
      readOnlyHint:  false,  // this tool DOES modify state
      destructiveHint: false, // it creates, not deletes
      idempotentHint:  false, // calling twice creates two users
      openWorldHint:   true,  // reaches outside &#8212; sends an email
    },
  },
  async ({ name, email, role }) =&gt; {
    const existing = await db.users.findUnique({ where: { email } });
    if (existing) {
      return {
        content: [{ type: "text", text: `User with email ${email} already exists.` }],
        isError: true,
      };
    }

    const user = await db.users.create({ data: { name, email, role } });
    await emailQueue.push({ type: "VERIFICATION", userId: user.id });

    return {
      content: [{ type: "text", text: `User created: ${user.id}` }],
    };
  }
);</code></pre></div><h3><strong>Registering a Resource</strong></h3><p>Resources expose structured data that the model can read as context. They use URI patterns and are identified like file system paths:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">import { ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";

server.registerResource(
  "user-profile",
  new ResourceTemplate("user://{userId}/profile", {
    list: async () =&gt; ({
      resources: (await db.users.findMany({ take: 50 })).map(u =&gt; ({
        uri:  `user://${u.id}/profile`,
        name: u.name,
      })),
    }),
  }),
  {
    title:       "User Profile",
    description: "Full profile data for a user, including preferences and permissions.",
    mimeType:    "application/json",
  },
  async (uri, { userId }) =&gt; ({
    contents: [{
      uri:  uri.href,
      text: JSON.stringify(await db.users.findUnique({ where: { id: userId } })),
    }],
  })
);</code></pre></div><h3><strong>Connecting the Transport and Starting the Server</strong></h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">const transport = new StdioServerTransport();
await server.connect(transport);

console.error("User service MCP server running on stdio");
// Note: use console.error for logs &#8212; stdout is reserved for MCP protocol messages</code></pre></div><h2><strong>How an MCP Request Actually Flows</strong></h2><p>Understanding the request lifecycle is what separates engineers who can use MCP from engineers who can <em>debug</em> it in production. Walk your interviewer through this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l7SW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l7SW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png 424w, https://substackcdn.com/image/fetch/$s_!l7SW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png 848w, https://substackcdn.com/image/fetch/$s_!l7SW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png 1272w, https://substackcdn.com/image/fetch/$s_!l7SW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l7SW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png" width="1406" height="1108" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1108,&quot;width&quot;:1406,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:254815,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/194881760?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l7SW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png 424w, https://substackcdn.com/image/fetch/$s_!l7SW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png 848w, https://substackcdn.com/image/fetch/$s_!l7SW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png 1272w, https://substackcdn.com/image/fetch/$s_!l7SW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40924e16-35b5-4165-a656-f3a5900dd26f_1406x1108.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>All of this happens over <strong>JSON-RPC 2.0</strong> &#8212; a simple, well-understood remote procedure call standard using JSON. If you&#8217;ve worked with any RPC system before, the wire format will feel familiar.</p><h2><strong>Transport Mechanisms: stdio vs Streamable HTTP</strong></h2><p>MCP supports two transport mechanisms. Which one to use depends entirely on where your server will run. Discuss this with your interviewer.</p><h3><strong>stdio (Standard Input/Output)</strong></h3><p>The host spawns your MCP server as a subprocess. All communication happens over <code>stdin</code> and <code>stdout</code>. Microsecond latency. No network overhead. No authentication complexity. But it only works locally &#8212; the host must be on the same machine as the server.</p><p>This is what Claude Desktop uses by default. It&#8217;s perfect for developer tools, local agents, and anything that runs on a single machine.</p><h3><strong>Streamable HTTP</strong></h3><p>The production-ready transport for remote servers. A single HTTP endpoint handles both directions &#8212; POST for client-to-server, with optional SSE streaming for server-to-client pushes. Works with any HTTP infrastructure, supports horizontal scaling, and deploys to Kubernetes or serverless without modification.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";

const app = express();
app.use(express.json());

app.all("/mcp", async (req, res) =&gt; {
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () =&gt; crypto.randomUUID(),
  });

  // Fresh server instance per session &#8212; stateless, horizontally scalable
  const sessionServer = new McpServer({ name: "user-service", version: "1.0.0" });
  registerAllTools(sessionServer); // your tool/resource registrations

  await sessionServer.connect(transport);
  await transport.handleRequest(req, res);
});

app.listen(3000, () =&gt; console.log("MCP server listening on :3000/mcp"));</code></pre></div><div class="callout-block" data-callout="true"><p><strong>Rule of thumb:</strong> Use stdio for local developer tools and CLI agents. Use Streamable HTTP for anything that needs to run remotely, scale horizontally, or be consumed by multiple hosts simultaneously.</p></div><h2><strong>Security Considerations</strong></h2><p>MCP is powerful precisely because it gives AI models the ability to execute real actions against real systems. That power demands real security thinking. Share this with your interviewer &#8212; it&#8217;s what distinguishes candidates who&#8217;ve thought deeply about this from candidates who haven&#8217;t.</p><ul><li><p><strong>Authentication for remote servers:</strong> The MCP spec classifies remote servers as OAuth Resource Servers. Use OAuth 2.0 with proper scopes. Never expose an unauthenticated remote MCP server to the internet &#8212; security scans of public MCP servers in 2025 found thousands with no authentication at all.</p></li><li><p><strong>Principle of least privilege:</strong> Your MCP server&#8217;s database credentials should only have access to what that server&#8217;s tools actually need. A read-only tool should connect with a read-only credential. Don&#8217;t give your entire database access to an MCP server just because it&#8217;s convenient.</p></li><li><p><strong>Input validation:</strong> Always use Zod schemas on every input. MCP tool inputs come from an LLM, and LLMs can produce unexpected values. Treat every tool call like an untrusted API request.</p></li><li><p><strong>Prompt injection awareness:</strong> A compromised MCP server could attempt to manipulate the LLM&#8217;s behavior through its tool output. Sanitize and structure your tool responses. Never echo user input directly back in tool results.</p></li><li><p><strong>Audit logging:</strong> Every tool call &#8212; who called it, with what arguments, what was returned &#8212; should be logged. MCP servers are a privileged execution path. You need to be able to reconstruct exactly what an agent did when something goes wrong.</p></li></ul><h2><strong>When to Use MCP vs Other Approaches</strong></h2><p>Not every backend service needs an MCP server. Be honest with your interviewer about when MCP is the right choice and when it isn&#8217;t.</p><p><strong>Use MCP when:</strong></p><ul><li><p>You want your backend service to be discoverable by multiple AI hosts without writing custom integrations for each</p></li><li><p>You are building an AI agent that needs to interact with internal tools &#8212; databases, APIs, file systems &#8212; in a structured, auditable way</p></li><li><p>You want to expose capabilities to AI models that were originally designed for human-driven API clients</p></li><li><p>You are building developer tooling that AI-powered IDEs like Cursor, Windsurf, or VS Code should be able to use</p></li></ul><p><strong>Don&#8217;t use MCP when:</strong></p><ul><li><p>Your only consumer is a human-driven frontend &#8212; REST or GraphQL is simpler and more appropriate</p></li><li><p>You need real-time bidirectional streaming &#8212; MCP is request-response, not WebSocket</p></li><li><p>You&#8217;re doing a quick one-off AI integration, and the overhead of building a full MCP server isn&#8217;t justified &#8212; direct function calling against your AI SDK may be faster</p></li></ul><h2><strong>Final Answer</strong></h2><div class="callout-block" data-callout="true"><p><em>&#8220;MCP is an open standard that solves the N&#215;M AI integration problem by defining a single protocol any AI host can use to discover and invoke capabilities on any MCP server. The architecture has three roles: the host, which contains the LLM and manages the session; the client, which handles the stateful protocol connection; and the server, which exposes capabilities through three primitives &#8212; tools for actions, resources for read-only data, and prompts for reusable templates. To expose a backend service, I&#8217;d build an MCP server using the official TypeScript SDK, register tools with Zod-validated schemas and clear descriptions, and deploy over Streamable HTTP for remote access. I&#8217;d secure it with OAuth, apply least-privilege database credentials, validate all inputs, and audit-log every tool call. The result is a backend service that any MCP-compatible AI agent can discover and use without custom integration code.&#8221;</em></p></div><p>Understanding MCP <em>sounds</em> like keeping up with the latest AI hype. But as you dig in, you realize it&#8217;s really a systems engineering problem that every backend engineer should recognize:</p><ul><li><p>Capability discovery &#8212; how does a consumer learn what a service can do?</p></li><li><p>Schema-first contracts &#8212; typed inputs and outputs, validated at the boundary</p></li><li><p>Stateful session management over a transport layer</p></li><li><p>Least-privilege security in a programmatic execution context</p></li><li><p>Audit logging for an automated, non-human caller</p></li></ul><p>These are not new concepts. They are the same engineering principles you apply to every production API you&#8217;ve ever built. MCP just applies them to a new class of consumer: the AI agent.</p><p>The engineers who will be most valuable in the next few years aren&#8217;t the ones who know the most AI frameworks. They are the ones who know how to build reliable, secure, observable backend services &#8212; and understand how to expose those services to AI systems correctly.</p><p>So the next time someone asks you, <strong>&#8220;What is MCP?&#8221;</strong> don&#8217;t just say, &#8220;It&#8217;s how AI models talk to tools.&#8221;</p><p>Explain the N&#215;M problem it solves. Walk them through the three primitives. Explain why the description of a tool is as important as the implementation. Talk about what happens when an agent calls your server with an unexpected input value.</p><p>That&#8217;s the answer that shows you understand what&#8217;s actually being built &#8212; not just the name of the protocol.</p><div><hr></div><p>I hope you learned something today: <strong>Spread the love.</strong> Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p><strong>Remember to start learning backend engineering from our courses:</strong></p><p>Get a <strong>50% discount</strong> on any of these courses. Reach out to me (Reply to this mail)</p><ol><li><p><strong><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></strong></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><strong><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/books">All Backend Books</a></strong></p></li><li><p><strong><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/community">Join our Community</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></strong></p></li></ol><div><hr></div><div class="callout-block" data-callout="true"><p><strong>Whenever you&#8217;re ready</strong></p><p>There are 3 ways I can help you become a great backend engineer:</p><p><strong>1. <a href="https://app.masteringbackend.com/">The MB Platform:</a></strong> Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://masteringbackend.com/academy">The MB Academy:</a></strong> The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://getbackendjobs.com/">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p></div><div><hr></div><p><strong>LAST WORD &#128075;</strong></p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello &#8212; I&#8217;d love to hear from you!</p><p><em><strong>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</strong></em></p>]]></content:encoded></item><item><title><![CDATA[AI Workshop: How to Land AI Engineering Jobs]]></title><description><![CDATA[Learn all the strategies needed to Land AI Engineering Jobs]]></description><link>https://kaperskyguru.substack.com/p/ai-workshop-how-to-land-ai-engineering</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/ai-workshop-how-to-land-ai-engineering</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 21 Mar 2026 13:00:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mkTW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><div><hr></div><blockquote><p><em>Today&#8217;s issue is brought to you by <strong><a href="https://masteringbackend.com/">Masteringbackend</a></strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach.</em></p></blockquote><div><hr></div><p>Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p><div><hr></div><p>Here&#8217;s another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><h4><strong>Before we dive in:</strong></h4><blockquote><p><strong>If you want your LLM or AI Agent to </strong>seamlessly search, navigate, and extract real-time data from any website without any blockers and CAPTCHAs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://brightdata.com/ai?utm_source=brand&amp;utm_campaign=brnd-mkt_newsletter_backendweekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mkTW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 424w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 848w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 1272w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mkTW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png" width="1456" height="735" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:735,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:250849,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://brightdata.com/ai?utm_source=brand&amp;utm_campaign=brnd-mkt_newsletter_backendweekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/191491706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!mkTW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 424w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 848w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 1272w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://brightdata.com/ai?utm_source=brand&amp;utm_campaign=brnd-mkt_newsletter_backendweekly&quot;,&quot;text&quot;:&quot;Try BrightData MCP.&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://brightdata.com/ai?utm_source=brand&amp;utm_campaign=brnd-mkt_newsletter_backendweekly"><span>Try BrightData MCP.</span></a></p></blockquote><div><hr></div><p>I&#8217;m hosting a free workshop TODAY. </p><h2><strong>&#8220;How to Land AI Engineering Jobs&#8221;</strong></h2><p>A free workshop for backend engineers who want to break into AI roles.</p><p><strong>Guest: Victor Eduoh, Founder of <a href="http://laand.me">Laand</a></strong> </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2PUL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2PUL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png 424w, https://substackcdn.com/image/fetch/$s_!2PUL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png 848w, https://substackcdn.com/image/fetch/$s_!2PUL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png 1272w, https://substackcdn.com/image/fetch/$s_!2PUL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2PUL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png" width="1456" height="1827" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1827,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5562938,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/191661985?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2PUL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png 424w, https://substackcdn.com/image/fetch/$s_!2PUL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png 848w, https://substackcdn.com/image/fetch/$s_!2PUL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png 1272w, https://substackcdn.com/image/fetch/$s_!2PUL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa982ebc4-00b5-4ec0-b26e-5fd5ff77925c_3240x4065.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><br>&#128197; Date: Today &#8212; Saturday, March 22nd <br>&#9200; Time: 4:00 PM UTC <br>&#9201;&#65039; Duration: 60-90 minutes <br>&#128176; Price: Free</p><p><strong><a href="https://luma.com/whx3268j">JOIN THE WORKSHOP &#8594;</a></strong></p><h3>Why this workshop?</h3><p>The AI job market is confusing right now.</p><p>Everyone says &#8220;learn AI&#8221;.</p><p> But which skills actually matter? How do you position yourself when everyone claims AI experience? What do hiring managers actually look for?</p><p>I don&#8217;t have all the answers. So I brought in someone who does.</p><p>Victor runs <a href="https://laand.me?ref=backendweekly">Laand.me</a>. A platform that helps people land jobs. He sees hundreds of applications. He knows what works and what doesn&#8217;t from the hiring side.</p><h4><strong>What we&#8217;re covering:</strong></h4><ul><li><p>Which AI skills are actually in demand (not what&#8217;s hyped)</p></li><li><p>How to position your backend experience for AI roles</p></li><li><p>What makes applications stand out vs. get rejected</p></li><li><p>Common mistakes that kill your chances</p></li><li><p>Live Q&amp;A</p></li></ul><h4><strong>Who should attend:</strong></h4><ul><li><p>Backend engineers looking at AI roles</p></li><li><p>Anyone confused about the AI job market</p></li><li><p>Engineers updating their resumes for 2026</p></li><li><p>People who want to know what hiring managers see</p></li></ul><p>Can&#8217;t make it live?</p><p>Register anyway. You&#8217;ll get the recording.</p><p><a href="https://luma.com/whx3268j">JOIN THE WORKSHOP &#8594;</a></p><p>See you at 4 PM UTC.</p><div><hr></div><p>I hope you learned something today: <strong>Spread the love.</strong> Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p><strong>Remember to start learning backend engineering from our courses:</strong></p><p>Get a <strong>50% discount</strong> on any of these courses. Reach out to me (Reply to this mail)</p><ol><li><p><strong><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></strong></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><strong><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/books">All Backend Books</a></strong></p></li><li><p><strong><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/community">Join our Community</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></strong></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p>]]></content:encoded></item><item><title><![CDATA[How would you design a globally distributed configuration propagation service?]]></title><description><![CDATA[How to push config updates to tens of thousands of servers in seconds, with versioning, rollback, and strong delivery guarantees &#8212; and explain it like a senior engineer in your next interview.]]></description><link>https://kaperskyguru.substack.com/p/how-would-you-design-a-globally-distributed</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-would-you-design-a-globally-distributed</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 21 Mar 2026 10:06:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YFS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><div><hr></div><blockquote><p><em>Today's issue is brought to you by <strong><a href="https://masteringbackend.com/">Masteringbackend</a></strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach.</em></p></blockquote><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Here's another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><h4><strong>Before we dive in:</strong></h4><blockquote><p><strong>If you want your LLM or AI Agent to </strong>seamlessly search, navigate, and extract real-time data from any website without any blockers and CAPTCHAs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mkTW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mkTW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 424w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 848w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 1272w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mkTW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png" width="1456" height="735" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:735,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:250849,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/191491706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mkTW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 424w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 848w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 1272w, https://substackcdn.com/image/fetch/$s_!mkTW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa57a93b9-53ff-4c7f-bf4c-2a67a710cfaf_2580x1302.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://brightdata.com/ai?utm_source=brand&amp;utm_campaign=brnd-mkt_newsletter_backendweekly&quot;,&quot;text&quot;:&quot;Try BrightData MCP.&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://brightdata.com/ai?utm_source=brand&amp;utm_campaign=brnd-mkt_newsletter_backendweekly"><span>Try BrightData MCP.</span></a></p></blockquote><div><hr></div><p><strong>This is the MB Interview Series on Backend Weekly, which airs every Saturday.</strong></p><p><em>In this series, I will guide you through answering common backend engineering interview questions, covering topics such as system design, microservices, API design, and databases. </em></p><p><em>Let's get started with episode 6 (<strong><a href="https://kaperskyguru.substack.com/p/how-would-you-design-a-globally-consistent">Episode 5 Here</a></strong>):</em></p><div><hr></div><h2><strong>The Interview Scenario</strong></h2><p>You&#8217;re in a backend interview.</p><p>They ask:</p><p><em><strong>&#8220;Design a globally distributed configuration propagation service that pushes config updates to tens of thousands of servers within seconds, with versioning, rollback, and strong delivery guarantees.&#8221;</strong></em></p><p>Here&#8217;s how to approach it:</p><div><hr></div><blockquote><p>We are building the next Interview Prep Playground targeting backend engineers.<br>Join our MB Interview waitlist: <strong><a href="https://tally.so/r/w46glb">https://tally.so/r/w46glb</a></strong></p></blockquote><div><hr></div><p>Every production system has configuration: <strong>feature flags</strong>, <strong>database connection strings</strong>, <strong>rate limit thresholds</strong>, <strong>encryption keys</strong>, and <strong>service discovery endpoints</strong>. Managing these at scale is one of the most underestimated problems in backend engineering.</p><p>Pushing a config update to ten servers is trivial. Pushing it to ten thousand servers that are spread across multiple regions, within two seconds, while guaranteeing every server gets the exact right version, with the ability to rollback in an instant?</p><p>That&#8217;s a whole different class of problem.</p><p>Strong backend engineers do not jump into the design without first understanding the exact requirements. </p><p>You should clarify the requirements with your interviewer:</p><h2><strong>Clarify the Requirements</strong></h2><p>Before drawing a single diagram, ask your interviewer what &#8220;correct&#8221; looks like for this system. For a config propagation service, the core requirements are:</p><ul><li><p><strong>Sub-second to low-second propagation:</strong> Config changes must reach all connected agents worldwide within seconds and not minutes.</p></li><li><p><strong>Immutable versioning:</strong> Every config change produces a new, immutable version. Old versions are never mutated.</p></li><li><p><strong>Atomic regional rollout:</strong> Config versions can be scoped to a region and deployed atomically; either all agents in a region get version N, or none do.</p></li><li><p><strong>Instantaneous rollback:</strong> Rolling back must be as fast as rolling forward. It&#8217;s not a revert. It&#8217;s a pointer swap.</p></li><li><p><strong>Integrity guarantees:</strong> Agents must verify the checksum and cryptographic signature of every config they receive. Tampered configs are rejected.</p></li><li><p><strong>Durability and auditability:</strong> Every version, every rollout, every agent acknowledgement must be logged and immutable.</p></li><li><p><strong>Conflict-free:</strong> Two operators cannot create conflicting versions simultaneously. The system must enforce strict ordering.</p></li></ul><p>Discuss each of these with your interviewer. They may tell you that eventually-consistent propagation is acceptable, or that rollback can be asynchronous. These trade-offs will shape the entire architecture.</p><p>Next, let&#8217;s explore the architecture after clarifying with your Interviewer.</p><h2><strong>High-Level Architecture</strong></h2><p>A production-grade config propagation service is built from four major layers. Each layer has a clear boundary and a single responsibility, and mixing them is how systems fail.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fGCY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fGCY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png 424w, https://substackcdn.com/image/fetch/$s_!fGCY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png 848w, https://substackcdn.com/image/fetch/$s_!fGCY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png 1272w, https://substackcdn.com/image/fetch/$s_!fGCY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fGCY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png" width="1218" height="1222" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1222,&quot;width&quot;:1218,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:139764,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/191491706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fGCY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png 424w, https://substackcdn.com/image/fetch/$s_!fGCY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png 848w, https://substackcdn.com/image/fetch/$s_!fGCY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png 1272w, https://substackcdn.com/image/fetch/$s_!fGCY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ec48a6-f474-4558-a9e8-02df09084a6b_1218x1222.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s break each layer down:</p><h2><strong>Core Components</strong></h2><h3><strong>Control Plane API</strong></h3><p>The control plane is the gatekeeper of the entire system. It is intentionally thin, stateless, and strict. An admin submits a config change here, and the control plane validates it, assigns an immutable version number, signs it, and writes it to the version store. </p><p>Nothing else. It does not notify agents directly.</p><p>Here&#8217;s a simplified version of the creation endpoint:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">import { Request, Response } from "express";
import { createHash, sign } from "crypto";
import { versionStore } from "./store";

export async function publishConfig(req: Request, res: Response) {
  const { configPayload, region, submittedBy } = req.body;

  // 1. Validate the payload schema
  if (!configPayload || !region) {
    return res.status(400).json({ error: "Invalid config request" });
  }

  // 2. Compute a deterministic checksum
  const checksum = createHash("sha256")
    .update(JSON.stringify(configPayload))
    .digest("hex");

  // 3. Sign with the control plane private key
  const signature = sign("sha256", Buffer.from(checksum), PRIVATE_KEY);

  // 4. Write an immutable version record
  const version = await versionStore.create({
    versionId: generateVersionId(),
    configPayload,
    checksum,
    signature: signature.toString("base64"),
    region,
    status: "PENDING",
    createdAt: Date.now(),
    createdBy: submittedBy,
  });

  return res.status(201).json({ versionId: version.versionId });
}</code></pre></div><h3><strong>Version Store</strong></h3><p>The version store is the single source of truth for the entire system. Every config version ever published lives here, forever, immutably. You never update a version record, but only append new ones or change the <code>activeVersionPointer</code> for a region.</p><p>Use a strongly consistent store here. The right options to discuss with your interviewer:</p><ul><li><p><strong>etcd:</strong> Excellent for low-latency, strongly consistent KV with watch semantics. Native to Kubernetes environments.</p></li><li><p><strong>Google Spanner:</strong> Best for globally distributed, externally consistent SQL. Ideal when you need a cross-region quorum without managing Raft yourself.</p></li><li><p><strong>ZooKeeper:</strong> Battle-tested in large-scale systems (Kafka, HBase). Strong consistency with leader election built in.</p></li></ul><p>The config blobs themselves and the actual payloads are stored in immutable object storage (S3, GCS). The version store holds only metadata and a pointer to the blob. This keeps the strongly-consistent store small and fast.</p><h3><strong>Regional Coordinators</strong></h3><p>Regional coordinators are the intelligence layer of the propagation path. Each coordinator owns a geographic region and is responsible for:</p><ul><li><p>Subscribing to version store change events (etcd watch, Spanner change streams)</p></li><li><p>Tracking which version each agent in its region has acknowledged</p></li><li><p>Publishing rollout metadata to the push clusters</p></li><li><p>Quarantining unhealthy agents that repeatedly fail to ack</p></li><li><p>Reporting rollout health back to the control plane</p></li></ul><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">async function onNewVersion(version: VersionRecord) {
  const agents = await agentRegistry.getActiveAgents(version.region);

  for (const agent of agents) {
    // Write rollout task for each agent
    await rolloutStore.upsert({
      agentId: agent.id,
      targetVersionId: version.versionId,
      status: "PENDING",
      attempts: 0,
    });
  }

  // Signal push clusters to notify agents in this region
  await pushBus.publish(version.region, {
    type: "VERSION_AVAILABLE",
    versionId: version.versionId,
  });
}</code></pre></div><h3><strong>Push Clusters (Fan-out Layer)</strong></h3><p>Push clusters are stateless frontend nodes that maintain persistent WebSocket (or long-poll) connections with agents. They do not store any state &#8212; they are pure notification relays.</p><p>When a coordinator publishes a <code>VERSION_AVAILABLE</code> event, every push node subscribed to that region&#8217;s channel broadcasts it to all connected agents. The push node&#8217;s job is only to deliver the signal. The actual config fetch happens agent-side.</p><p>This separation (<em>signal vs. payload)</em> is what makes the fan-out scalable. You don&#8217;t push megabytes of config data through ten thousand WebSocket connections. You push a tiny notification that says, <strong>&#8220;version X is ready, go fetch it.&#8221;</strong></p><h3><strong>Edge Agents</strong></h3><p>Agents run on every server in your fleet. They are the final consumer of the config system, and they enforce the strongest guarantees. Each agent:</p><ul><li><p>Maintains a persistent WebSocket connection to the nearest push cluster in its region</p></li><li><p>Receives the <code>VERSION_AVAILABLE</code> notification</p></li><li><p>Fetches the config blob from object storage</p></li><li><p>Verifies the checksum AND the cryptographic signature against the control plane&#8217;s public key</p></li><li><p>Applies the config atomically (swap in-memory reference, persist to local disk)</p></li><li><p>Sends an acknowledgement back to the coordinator</p></li><li><p>Retries the entire fetch-verify-apply cycle on any failure</p></li></ul><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">async function onVersionAvailable(versionId: string) {
  const alreadyApplied = await localStore.has(versionId);
  if (alreadyApplied) return; // idempotency guard

  let attempts = 0;

  while (attempts &lt; MAX_RETRIES) {
    try {
      // 1. Fetch the config blob
      const blob = await objectStorage.get(versionId);

      // 2. Verify checksum
      const computedHash = sha256(blob.payload);
      if (computedHash !== blob.checksum) throw new Error("Checksum mismatch");

      // 3. Verify signature
      const valid = verify(blob.checksum, blob.signature, CONTROL_PLANE_PUBLIC_KEY);
      if (!valid) throw new Error("Signature verification failed");

      // 4. Atomic apply + persist for restart resilience
      currentConfig = blob.payload;
      await localStore.save(versionId, blob);

      // 5. Acknowledge to coordinator
      await coordinator.ack({ agentId: AGENT_ID, versionId, status: "APPLIED" });
      return;

    } catch (err) {
      attempts++;
      const delay = 2 ** attempts * 500; // exponential backoff
      logger.warn(`Retry ${attempts} for version ${versionId} in ${delay}ms`);
      await sleep(delay);
    }
  }

  // Report persistent failure to coordinator
  await coordinator.ack({ agentId: AGENT_ID, versionId, status: "FAILED" });
}</code></pre></div><h2><strong>The Primary Config Propagation Flow</strong></h2><p>Every config change passes through this exact lifecycle. Walk your interviewer through each step. This is the story of how a config update travels from an operator&#8217;s keyboard to ten thousand servers:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YFS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YFS7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png 424w, https://substackcdn.com/image/fetch/$s_!YFS7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png 848w, https://substackcdn.com/image/fetch/$s_!YFS7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png 1272w, https://substackcdn.com/image/fetch/$s_!YFS7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YFS7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png" width="1290" height="894" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:894,&quot;width&quot;:1290,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:106871,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/191491706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YFS7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png 424w, https://substackcdn.com/image/fetch/$s_!YFS7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png 848w, https://substackcdn.com/image/fetch/$s_!YFS7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png 1272w, https://substackcdn.com/image/fetch/$s_!YFS7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53d17ae-6f57-46ad-a5c2-8a0c75d9e983_1290x894.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Notice something critical: the config payload never flows through the push cluster. Steps 4 and 5 are deliberately separated. </p><p>The push cluster sends only a tiny signal. The agent fetches the actual blob independently from object storage. This is what keeps the fan-out layer fast and unbounded in its connection count.</p><h2><strong>Versioning and Rollback</strong></h2><p>Versioning is the backbone of the entire system. </p><p>Discuss this carefully with your interviewer. Discuss how your model versions determine how rollback, audits, and conflict resolution all work.</p><p>The core principle: <strong>a version is never mutated.</strong></p><p>If a config change is wrong, you do not patch the version. You publish a new version that supersedes it, or you update the <code>activeVersionPointer</code> for a region to point back to a prior version.</p><p>That last part is rollback. It is not an undo operation. It is a pointer swap:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">async function rollback(region: string, targetVersionId: string) {
  // Validate that target version exists and was previously COMMITTED
  const target = await versionStore.getVersion(targetVersionId);
  if (!target || target.region !== region) {
    throw new Error("Invalid rollback target");
  }

  // The rollback IS a new publish event &#8212; same propagation path
  await versionStore.setActivePointer(region, targetVersionId);

  // This triggers coordinators and push clusters automatically
  // &#8212; the rollback is as fast as a forward deployment
  logger.info(`Rollback initiated: region=${region} target=${targetVersionId}`);
}</code></pre></div><p>Because rollback reuses the same propagation path, it has the same latency guarantees as a forward deployment. </p><p>There is no special rollback codepath. This is intentional because special codepaths under incident pressure are how rollbacks fail.</p><h2><strong>Reliability and Delivery Guarantees</strong></h2><p>This is where you impress your interviewer. Most candidates design the happy path well. What distinguishes senior engineers is how they design for failure.</p><p>Discuss these guarantees explicitly:</p><h4><strong>At-Least-Once Notification, Exactly-Once Application</strong></h4><p>The push cluster may deliver <code>VERSION_AVAILABLE</code> more than once &#8212; that&#8217;s acceptable. The agent protects against duplicate applications with an idempotency check at the start of <code>onVersionAvailable()</code>. If the agent already has version N applied locally, it discards the notification.</p><h4><strong>Commit = Checksum + Signature Verification</strong></h4><p>A version is not &#8220;committed&#8221; until the agent reports a successful signature verification. The coordinator tracks this per agent. An agent that reports <code>FAILED</code> after MAX_RETRIES is quarantined &#8212; flagged for human review &#8212; and the version is not considered committed for that agent&#8217;s server.</p><h4><strong>Agents Offline for Extended Periods</strong></h4><p>An agent that reconnects after a long outage will not receive the WebSocket notification it missed. This is fine. On reconnect, the agent compares its locally persisted version ID against the coordinator&#8217;s current active version for its region. If they differ, it fetches and applies the delta. This is the pull-on-reconnect pattern:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">async function onAgentReconnect(agentId: string) {
  const agentVersion = await localStore.getCurrentVersionId();
  const activeVersion = await coordinator.getActiveVersion(REGION);

  if (agentVersion !== activeVersion) {
    logger.info(`Version drift detected. Reconciling: ${agentVersion} &#8594; ${activeVersion}`);
    await onVersionAvailable(activeVersion);
  }
}</code></pre></div><blockquote><p><strong>Tell your interviewer this:</strong> The system must never require a running push connection to converge. Push-on-change is an optimization for speed. Pull-on-reconnect is the correctness guarantee. Always design both paths.</p></blockquote><h2><strong>Scaling Strategy</strong></h2><p>Scaling a config propagation service is not about scaling one big thing &#8212; it&#8217;s about independently scaling the right layers. Share this with your interviewer:</p><ul><li><p><strong>Control Plane API:</strong> Stateless. Scale horizontally with standard load balancing. Write throughput is low (config changes are rare compared to reads), so this is rarely the bottleneck.</p></li><li><p><strong>Version Store:</strong> Globally replicated via multi-region quorum. Use etcd clusters per region with cross-region replication, or Google Spanner for a managed globally-consistent store.</p></li><li><p><strong>Regional Coordinators:</strong> Sharded by region. Each coordinator owns its region exclusively. No cross-coordinator coordination is needed. Add coordinators as regions expand.</p></li><li><p><strong>Push Clusters:</strong> Scaled by connection fan-out. Each push node handles tens of thousands of WebSocket connections. Add push nodes to increase connection capacity. They are purely stateless &#8212; any push node can serve any agent in the region.</p></li><li><p><strong>Agents:</strong> Connect to the nearest push cluster using geo-DNS or latency-based routing. Each agent stores its applied version locally, so restarts do not require a full re-fetch.</p></li></ul><h3><strong>Backpressure via Staged Rollouts</strong></h3><p>At tens of thousands of agents, a simultaneous rollout causes a thundering herd against object storage. Use staged rollouts:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">const ROLLOUT_STAGES = [
  { percentage: 1,   waitMs: 30_000 },  // 1% canary &#8212; watch for 30s
  { percentage: 10,  waitMs: 60_000 },  // 10% early &#8212; watch for 60s
  { percentage: 50,  waitMs: 120_000 }, // 50% broad &#8212; watch for 2m
  { percentage: 100, waitMs: 0 },       // full fleet
];

async function stagedRollout(versionId: string, region: string) {
  const agents = await agentRegistry.getActiveAgents(region);

  for (const stage of ROLLOUT_STAGES) {
    const cohort = sample(agents, stage.percentage / 100);
    await pushBus.publishToCohort(cohort, { type: "VERSION_AVAILABLE", versionId });
    await waitForAcks(cohort, versionId); // block until stage is healthy
    if (stage.waitMs &gt; 0) await sleep(stage.waitMs);
  }
}</code></pre></div><p>Staged rollouts give you progressive blast radius control. If 1% of the canary cohort starts failing signature verification, you halt the rollout before it reaches the full fleet.</p><h2><strong>Observability and Monitoring</strong></h2><p>You can&#8217;t manage what you can&#8217;t see.</p><p>Discuss with your interviewer what to measure and what to alert on. Here&#8217;s what matters for this system specifically:</p><h3><strong>Metrics to Track</strong></h3><ul><li><p><strong>Propagation latency (p50, p95, p99):</strong> From version creation to last agent ack per region. This is your SLO metric. It should be under 5 seconds at p99.</p></li><li><p><strong>Agent ack rate:</strong> What percentage of the fleet has acknowledged the current active version? Below 99% is a yellow alert. Below 95% is red.</p></li><li><p><strong>Ack skew:</strong> How spread out are acknowledgement timestamps? High skew indicates a push cluster bottleneck or a connectivity issue in part of the fleet.</p></li><li><p><strong>Signature verification failures:</strong> Any non-zero count here is an immediate incident. Could indicate a key rotation issue or a compromised config pipeline.</p></li><li><p><strong>Rollout stall rate:</strong> How often do staged rollouts stall at a given percentage? A pattern here indicates fragile canary cohort selection.</p></li></ul><h3><strong>Tools</strong></h3><ul><li><p><strong>Prometheus + Grafana:</strong> Scrape coordinator and push cluster metrics. Dashboard per region showing propagation latency and ack curves.</p></li><li><p><strong>OpenTelemetry:</strong> Distributed traces from control plane &#8594; coordinator &#8594; push cluster &#8594; agent. This is the only way to pinpoint where latency is introduced in the propagation path.</p></li><li><p><strong>Structured logging:</strong> Every version creation, rollout event, agent ack, and signature result must produce a structured log entry with <code>versionId</code>, <code>agentId</code>, <code>region</code>, and <code>timestamp</code>. These logs are your audit trail.</p></li></ul><h3><strong>Alerts</strong></h3><ul><li><p>Region ack rate below 95% for more than 60 seconds after a publish</p></li><li><p>Any signature verification failure</p></li><li><p>The coordinator's heartbeat is missing for more than 30 seconds</p></li><li><p>Version drift between regions exceeding N minutes</p></li></ul><h2><strong>Edge Cases and Trade-offs</strong></h2><p>Great candidates discuss edge cases unprompted. Bring these up with your interviewer:</p><ul><li><p><strong>Coordinator overloaded during mass reconnect:</strong> If a regional outage causes thousands of agents to reconnect simultaneously, the coordinator sees a thundering herd of ack-check requests. Fix: stagger reconnection with randomized jitter, and rate-limit ack-check calls at the coordinator.</p></li><li><p><strong>Split-brain version pointers:</strong> Two coordinators disagree on the active version for a region after a network partition. Fix: The version store uses a strong quorum for all pointer writes. No coordinator can update the active pointer without a majority consensus.</p></li><li><p><strong>Cost trade-off &#8212; persistent connections vs periodic pull:</strong> Ten thousand persistent WebSocket connections cost real money in infrastructure. If ultra-low latency is not required, periodic pull (agents polling every 5 seconds) dramatically simplifies the system. Discuss this explicitly &#8212; it&#8217;s a valid design choice for some use cases.</p></li><li><p><strong>Blast radius during rollout:</strong> A bad config that reaches 100% of the fleet before failure is detected can take down your entire service. Progressive staged rollouts &#8212; with automatic halt on ack failure &#8212; are non-negotiable for production systems.</p></li></ul><h3><strong>Final Answer</strong></h3><div class="pullquote"><p><em>&#8220;I&#8217;d design this system using a global control plane with immutable versioning, regional coordinators for scoped rollout, and fan-out push clusters for low-latency propagation. Config payloads are stored as immutable blobs in object storage and signed by the control plane.&#8221;</em></p></div><p>Designing a globally distributed config propagation service <em>sounds</em> like a glorified key-value store with a webhook. But as you dig in, you realize it&#8217;s a deep exercise in:</p><ul><li><p>Immutable data modeling and version pointer semantics</p></li><li><p>Fan-out at scale without thundering herd</p></li><li><p>Cryptographic integrity in a distributed trust model</p></li><li><p>Exactly-once application with at-least-once delivery</p></li><li><p>Progressive deployment as a first-class system primitive</p></li><li><p>Reconciliation for agents that were offline during a rollout</p></li></ul><p>These are the kinds of systems that separate engineers who can build features from engineers who can build infrastructure.</p><p>So the next time an interviewer asks, <strong>&#8220;How would you design a globally distributed configuration propagation service?&#8221;</strong> don&#8217;t just say, &#8220;I&#8217;ll use etcd and WebSockets.&#8221;</p><p>Walk them through the versioning model. Walk them through the signal vs. payload separation. Walk them through what rollback actually means at this scale. Walk them through what happens to an agent who was offline for six hours.</p><p>That&#8217;s the answer that shows you&#8217;ve built real systems &#8212; not just read about them.</p><div><hr></div><p>I hope you learned something today: <strong>Spread the love.</strong> Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p><strong>Remember to start learning backend engineering from our courses:</strong></p><p>Get a <strong>50% discount</strong> on any of these courses. Reach out to me (Reply to this mail)</p><ol><li><p><strong><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></strong></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><strong><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/books">All Backend Books</a></strong></p></li><li><p><strong><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/community">Join our Community</a></strong></p></li><li><p><strong><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></strong></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p>]]></content:encoded></item><item><title><![CDATA[How to Debug AI Backend Systems]]></title><description><![CDATA[We will delve into Observability in AI systems. We will explore the fundamental problem, why your current logging system will fail you, how to build a proper observation system for AI]]></description><link>https://kaperskyguru.substack.com/p/how-to-debug-ai-backend-systems</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-to-debug-ai-backend-systems</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 14 Mar 2026 10:36:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tzlx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9b724e-bb27-4b40-967d-6804efa6ea64_392x392.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com/">Masteringbackend</a> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Here&#8217;s another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><p><strong>Before we dive in:</strong></p><p>Your AI Agents will be stupid with hallucinated results.<br><br>Except you give it the right data.<br><br>Nimble makes AI Agents smarter by giving them access to retrieve real-time web data in a structured, tabular format.<br><br>With Nimble, you can:<br><br>- Turn the web into data tables, not just markdown with long text.<br>- Live Web Access, Not Stale Indexes<br>- Access Any Website (Even JS Heavy Ones)<br><br>Imagine you&#8217;re searching for something like:<br><br>&#8220;Which stores have a PS5 in stock within 25 miles right now&#8212;include price, pickup time, and store address from each retailer&#8217;s site?&#8221;<br><br>This is a time-oriented question, and your AI Agents won&#8217;t give the right result unless you use Nimble:<br><br>Watch this video:<br><br>Here&#8217;s the documentation to start with: <a href="https://docs.nimbleway.com">https://docs.nimbleway.com</a></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;64aa3a61-ccec-417b-9d4e-6f6d6fe09ae5&quot;,&quot;duration&quot;:null}"></div><div><hr></div><p>I want to tell you about the worst three days of my engineering career.</p><p>It started with a Slack message from our support team:</p><p>&#8220;Users are complaining that the AI is giving wrong answers.&#8221;</p><p>That was it.</p><p>There were no error codes, stack traces, or even reproducible steps. Just... the AI was wrong.</p><p>I opened our logs and checked: the request was received at 10:47 AM, the response was sent at 10:49 AM, and the status was 200.</p><p>Everything looked perfect. The system had done exactly what it was supposed to do: very simple.</p><p>Accept a question, process it, and return an answer.</p><p>At the time, the answer was completely wrong, and I had absolutely no idea why.</p><p>I spent the next three days in debugging hell. I tried reproducing the issue locally, but of course, the AI gave different answers each time. I added more logging, but I was logging the wrong things. I stared at the database queries that returned the right data. I reviewed prompts that looked fine. I checked the API responses that seemed correct. Every individual piece worked, but the whole system didn&#8217;t.</p><p>On day three, I finally found it. The problem was in our chunking strategy.</p><p>We were splitting documents in the middle of sentences, so when users asked certain questions, the retrieved context was grammatically incomplete and semantically meaningless. The AI was doing its best with garbage input, producing garbage output.</p><p>That was how I spent three days on a chunking bug. Not because the bug was hard to fix, but because I couldn&#8217;t see what was happening inside the system.</p><p>I was debugging blind.</p><p>That experience fundamentally changed how I think about AI observability. Traditional debugging doesn&#8217;t work for AI systems. The tools and practices we&#8217;ve developed over decades of software engineering, you know, the stack traces, error logs, breakpoints, etc., they&#8217;re not enough. AI systems fail differently, and they need to be observed differently.</p><p>In this article, we will delve into Observability in AI systems. We will explore the fundamental problem, why your current logging system will fail you, how to build a proper observation system for AI systems, and finally, the debugging workflow that actually works.</p><h3><strong>The Fundamental Problem</strong></h3><p>Here&#8217;s what makes AI debugging so uniquely frustrating:</p><p>Traditional applications have the decency to crash when something goes wrong. They throw exceptions. They return error codes. They give you a stack trace pointing to the exact line where things went sideways.</p><p>You might not immediately know how to fix the problem, but at least you know where it is.</p><p>However, an AI system is different; it doesn&#8217;t do this. They keep running. They keep returning responses. They keep returning status 200. But the responses are incorrect, and nothing in your standard monitoring indicates this is happening.</p><p>Think about what happens in a RAG pipeline when something goes wrong. Maybe your embedding model is poorly suited for your domain, so queries about technical concepts get matched with vaguely related but ultimately unhelpful documents.</p><p>The system will never throw an error, and your vector database will keep returning the wrong results each time.</p><p>The LLM receives this irrelevant context and, as LLMs do, generates a plausible-sounding response based on what it was given. The response is confident. The response is articulate. The response is wrong.</p><p>From the outside, everything looks fine. Your request latency is normal. Your error rate is zero. Your uptime is 100%. Meanwhile, users are getting incorrect information and losing trust in your product, and you have no idea it&#8217;s happening until someone complains.</p><p>This is the core challenge of AI observability: you&#8217;re not looking for crashes, you&#8217;re looking for degradation. You&#8217;re not hunting for errors, you&#8217;re hunting for quality problems. And quality problems are sneaky. They don&#8217;t announce themselves. They hide in the space between &#8220;working&#8221; and &#8220;working well.&#8221;</p><h3><strong>Why Your Current Logging Strategy Is Failing You</strong></h3><p>When I first started building AI systems, I logged what I always logged: inputs and outputs. Request came in with this question, response went out with this answer. Basic request/response logging, same as any API.</p><p>This is almost useless for AI debugging.</p><p>The problem is that AI systems aren&#8217;t simple request/response pipelines. They&#8217;re multi-step workflows where each step transforms data in ways that affect downstream steps.</p><p>A RAG query might involve embedding the question, searching a vector database, reranking the results, constructing a prompt, calling the LLM, and post-processing the response.</p><p>That&#8217;s six distinct operations, each with its own potential failure modes, and if you&#8217;re only logging the first input and final output, you&#8217;re blind to everything in between.</p><p>When something goes wrong, you need to know:</p><ul><li><p>Was the embedding generated correctly?</p></li><li><p>What chunks did the vector search return, and what were their similarity scores?</p></li><li><p>Did reranking change the order?</p></li><li><p>What context ended up in the prompt?</p></li><li><p>How many tokens were used?</p></li><li><p>What did the LLM actually see when it generated its response?</p></li></ul><p>Without this information, debugging becomes guesswork. You&#8217;re trying to figure out which of six potential failure points caused the problem, but you can only see the beginning and end. It&#8217;s like trying to debug a function when you can only see the input parameters and return value, not any of the intermediate computations.</p><p>I learned this lesson the hard way during those three days of debugging. My logs showed the question and the answer. They didn&#8217;t show that the retrieved chunks had unusually low similarity scores, which would have immediately pointed me toward a retrieval problem.</p><p>They didn&#8217;t show that the context contained incomplete sentences, which would have pointed me toward chunking.</p><p>I had to add this logging manually, re-deploy, wait for similar queries, and then analyze the results&#8212;three days of detective work that would have taken three minutes with proper observability.</p><h3><strong>Building an Observability Stack for AI Systems</strong></h3><p>After that experience, I completely redesigned how I approach AI observability. I now think of it in four layers, each building on the one below:</p><h4><strong>Structured logging</strong></h4><p>The foundation is structured logging with AI-specific context: This isn&#8217;t just &#8220;log more stuff.&#8221; It&#8217;s logging the right stuff, in the right format, at the right points in your pipeline. Every operation that touches AI, such as embedding, retrieval, reranking, prompt construction, and generation, needs its own log entry with all the relevant context.</p><p>For an embedding operation, you need to capture which model you used, how many tokens were in the input, how long it took, and, ideally, an identifier that lets you correlate it with other operations in the same request.</p><p>For retrieval, you need the query, the number of results, the similarity scores for the retrieved results, and the time taken. For a generation, you need the model, input tokens, output tokens, cost, and latency.</p><p>The key insight is that each log entry should be self-contained enough that, if something goes wrong, you can look at that entry and understand what happened at that step. You shouldn&#8217;t need to cross-reference five different log lines to piece together the story.</p><p>Here&#8217;s what a properly structured log entry looks like for a retrieval operation:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">{
  "trace_id": "abc-123-def-456",
  "timestamp": "2026-02-24T14:30:22.847Z",
  "operation": "vector_retrieval",
  "service": "rag-service",
  "query_text": "What is the refund policy for digital products?",
  "embedding_model": "text-embedding-3-small",
  "vector_db": "pinecone",
  "index": "knowledge-base-prod",
  "top_k_requested": 10,
  "results_returned": 10,
  "similarity_scores": [0.91, 0.87, 0.85, 0.82, 0.79, 0.76, 0.71, 0.68, 0.65, 0.61],
  "top_result_preview": "Digital products are eligible for refund within 14 days of purchase...",
  "latency_ms": 147,
  "metadata": {
    "user_id": "user-789",
    "session_id": "session-012"
  }
}</code></pre></div><p>With this level of detail, when a user reports a wrong answer, I can query my logs for that trace ID and immediately see what happened. If the similarity scores are all below 0.7, I know retrieval failed to find relevant content.</p><p>If the top result preview doesn&#8217;t match the query topic, I know there&#8217;s a mismatch somewhere in my embeddings or indexing.</p><p>This time, I&#8217;m not guessing anymore, but I&#8217;m diagnosing.</p><h4><strong>Distributed Tracing</strong></h4><p>The second layer is distributed tracing, which connects all these individual log entries into a coherent timeline. A trace shows you the entire journey of a request through your system:</p><p>&#8220;It started here, went through these operations in this order, took this long at each step, and ended here.&#8221;</p><p>Traces are invaluable because they show you not just what happened, but the sequence and timing.</p><p>I use OpenTelemetry for this, though there are other options. The key is that every operation creates a &#8220;span&#8221; with a start time, end time, and relevant attributes, and all spans in the same request share a trace ID.</p><p>When I&#8217;m debugging, I can pull up a trace and see a visual timeline of exactly how the request was processed.</p><h3><strong>Metrics:</strong></h3><p>The third layer is metrics, an aggregate of measurements over time that indicate system health. While logs and traces help you debug individual requests, metrics help you understand patterns.</p><p>It answers questions such as:</p><ul><li><p>What&#8217;s my average retrieval similarity score this hour?</p></li><li><p>How has latency changed over the past week?</p></li><li><p>What percentage of queries are hitting my cache?</p></li></ul><p>Metrics turn individual observations into trends.</p><p>For AI systems, you need metrics that traditional APM tools don&#8217;t provide out of the box. I track things like:</p><ul><li><p>Average retrieval quality scores</p></li><li><p>Confidence score distributions</p></li><li><p>Token usage by model</p></li><li><p>Cost per request</p></li><li><p>Cache hit rates</p></li><li><p>Hallucination detection rates.</p></li></ul><p>These metrics tell me when something is degrading before users start complaining.</p><h4><strong>Alerting</strong></h4><p>The fourth layer is alerting, and it sends automated notifications when metrics cross certain thresholds.</p><p>If my average retrieval similarity score drops below 0.7 for ten minutes, I want to know immediately. If my hourly AI spend exceeds my budget, I want an alert. If my error rate spikes, I want to be paged.</p><p>The goal of this four-layer stack is simple:</p><p>I never want to be surprised by an AI problem again. I want to catch degradation before users notice, and when users do report issues, I want to diagnose the root cause in minutes, not days.</p><h3><strong>The Debugging Workflow That Actually Works</strong></h3><p>Let me walk you through how I debug AI issues now, using the observability stack I just described.</p><p>A support ticket comes in:</p><p>&#8220;User asked about shipping times and got information about return policies instead.&#8221;</p><p>This is a classic symptom. The AI answered confidently, but answered the wrong question.</p><h4><strong>Step 1:</strong></h4><p>First, I find the trace ID. Every response from my AI system includes a trace ID in the response metadata so that support can include it in tickets. If they didn&#8217;t include it, I can search my logs by user ID and timestamp to find the relevant trace.</p><h4><strong>Step 2:</strong></h4><p>Once I have the trace ID, I pull up the full trace. I&#8217;m looking at a timeline that shows me:</p><ul><li><p>Query received</p></li><li><p>Embedding generated</p></li><li><p>Vector search executed</p></li><li><p>Reranking applied</p></li><li><p>Prompt constructed</p></li><li><p>LLM called</p></li><li><p>Response returned.</p></li></ul><p>Each step shows its duration and key attributes.</p><h4><strong>Step 3:</strong></h4><p>I start from the beginning. The query was &#8220;How long does shipping take?&#8221;</p><p>Therefore, that&#8217;s correct, it matches what the user asked. The embedding was generated in 45ms using text-embedding-3-small.</p><p>Then I look at vector retrieval:</p><ul><li><p>10 results returned</p></li><li><p>Top score 0.73</p></li><li><p>Top result preview &#8220;Our return policy allows...&#8221;</p></li></ul><p>The top score is 0.73, which is on the lower end. And the top result preview is about returns, not shipping.</p><p>That&#8217;s the problem right there.</p><p>The retrieval step failed to find relevant shipping content and instead returned the most similar content it could find, which happened to be about returns.</p><h4><strong>Step 4:</strong></h4><p>Now I know where to look. Why didn&#8217;t retrieval find shipping content?</p><p>I can easily think through a few possibilities:</p><ul><li><p>Maybe there&#8217;s no content about shipping in the knowledge base.</p></li><li><p>Maybe the content exists, but wasn&#8217;t embedded correctly</p></li><li><p>Maybe there&#8217;s a mismatch between how the content was chunked and how the query was embedded.</p></li></ul><p>I search my knowledge base index for &#8220;shipping&#8221; to make sure the content exists.</p><p>I look at the embedding for that content compared to the query embedding, and they should be similar. If they&#8217;re not, I might have an embedding model mismatch or a chunking problem.</p><p>In this case, I discovered that the shipping information was embedded in a larger document on order processing, and the chunk that contains the shipping details also includes content on order tracking, payment processing, and other topics.</p><p>The embedding for that chunk reflects all those topics, diluting the &#8220;shipping&#8221; signal.</p><p>With only 4 steps, I have easily identified the root cause: a poor chunking strategy that embeds relevant content with too much surrounding context.</p><p>I can quickly introduce a fix by re-chunking the order processing document into more focused sections, one specifically about shipping times.</p><p>With this approach, the total debugging time is about 15 minutes. Now compare that to three days of blind guessing.</p><h2><strong>Practical Implementation</strong></h2><p>I want to share some specific implementation patterns that have worked well for me, because the concepts are only useful if you can actually build them.</p><p>For logging, I created a simple class that wraps all my AI operations and automatically logs with the right structure.</p><p>Every time I make an embedding, retrieval, or LLM call, it goes through this wrapper, which captures timing, token counts, scores, and other relevant metadata. The wrapper also propagates trace IDs, so all operations in a request are connected.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">class AILogger {
  constructor(serviceName) {
    this.serviceName = serviceName;
  }
  createTrace(requestId) {
    return new AITrace(requestId, this.serviceName);
  }
}
class AITrace {
  constructor(requestId, serviceName) {
    this.traceId = requestId || crypto.randomUUID();
    this.serviceName = serviceName;
    this.spans = [];
    this.startTime = Date.now();
  }
  span(operation, data = {}) {
    const span = {
      traceId: this.traceId,
      operation,
      timestamp: new Date().toISOString(),
      elapsed_ms: Date.now() - this.startTime,
      service: this.serviceName,
      ...data
    };
  
    this.spans.push(span);
    
    // Send to your logging backend
    console.log(JSON.stringify(span));
    
    return span;
  }
  // Specific logging methods for common AI operations
  logEmbedding(model, inputTokens, latencyMs) {
    return this.span('embedding', {
      step: 'embedding',
      model,
      input_tokens: inputTokens,
      latency_ms: latencyMs
    });
  }

  logRetrieval(query, results, latencyMs) {
    return this.span('retrieval', {
      step: 'retrieval',
      query_length: query.length,
      chunks_retrieved: results.length,
      top_score: results[0]?.score,
      bottom_score: results[results.length - 1]?.score,
      scores: results.map(r =&gt; r.score),
      latency_ms: latencyMs
    });
  }

  logReranking(beforeCount, afterCount, latencyMs) {
    return this.span('reranking', {
      step: 'reranking',
      chunks_before: beforeCount,
      chunks_after: afterCount,
      latency_ms: latencyMs
    });
  }

  logGeneration(model, inputTokens, outputTokens, latencyMs) {
    return this.span('generation', {
      step: 'generation',
      model,
      input_tokens: inputTokens,
      output_tokens: outputTokens,
      latency_ms: latencyMs,
      cost_usd: this.estimateCost(model, inputTokens, outputTokens)
    });
  }

  logPrompt(systemPrompt, userPrompt, context) {
    return this.span('prompt_construction', {
      step: 'prompt',
      system_prompt_tokens: this.countTokens(systemPrompt),
      user_prompt_tokens: this.countTokens(userPrompt),
      context_tokens: this.countTokens(context),
      // Store first 500 chars for debugging (not full prompt for privacy)
      context_preview: context.substring(0, 500)
    });
  }

  logResponse(response, confidence) {
    return this.span('response', {
      step: 'response',
      response_tokens: this.countTokens(response),
      confidence_score: confidence,
      response_preview: response.substring(0, 200)
    });
  }

  estimateCost(model, inputTokens, outputTokens) {
    const pricing = {
      'gpt-4': { input: 0.03, output: 0.06 },
      'gpt-4-turbo': { input: 0.01, output: 0.03 },
      'gpt-3.5-turbo': { input: 0.0005, output: 0.0015 },
      'claude-3-sonnet': { input: 0.003, output: 0.015 }
    };
    const p = pricing[model] || pricing['gpt-3.5-turbo'];
    return ((inputTokens / 1000) * p.input) + ((outputTokens / 1000) * p.output);
  }

  countTokens(text) {
    // Rough estimation: ~4 chars per token
    return Math.ceil((text?.length || 0) / 4);
  }

  complete() {
    return this.span('trace_complete', {
      total_duration_ms: Date.now() - this.startTime,
      span_count: this.spans.length
    });
  }
}</code></pre></div><p>The critical insight is to log at decision points, not just at entry and exit.</p><ul><li><p>When my system decides to use a particular embedding model, I log that decision and why.</p></li><li><p>When reranking changes the order of results, I log the before and after rankings.</p></li><li><p>When I decide to fall back to a different model because the primary one is slow,</p></li></ul><p>I log that fallback decision. These decision points are often where bugs hide, and having visibility into the decisions makes debugging much faster.</p><p>For metrics, I use Prometheus with Grafana, but the specific tools matter less than the metrics you choose to track.</p><p>My dashboard shows me six key AI metrics at a glance:</p><ul><li><p>Average retrieval similarity score (quality)</p></li><li><p>P95 generation latency (performance)</p></li><li><p>Cache hit rate (efficiency)</p></li><li><p>Hourly token usage by model (cost)</p></li><li><p>Error rate by operation (reliability)</p></li><li><p>Requests per minute (volume).</p></li></ul><p>These six numbers give me a quick health check of my AI system. If any of them look unusual, I dig deeper.</p><p>For alerting, I&#8217;ve learned to alert on trends, not just thresholds. It&#8217;s not very useful to alert when the retrieval score drops below 0.7, because scores fluctuate query-by-query, and I&#8217;d get a lot of noise.</p><p>Instead, I alert when the average retrieval score over a 15-minute window falls below 0.75, indicating genuine degradation rather than a few unlucky queries.</p><p>Similarly, I am alerted on the rate of change. For example, if my costs are increasing 50% faster than my request volume, something is probably wrong even if the absolute numbers are still within budget.</p><h2><strong>The Tools</strong></h2><p>People often ask me which observability tools to use. Honestly, the specific tools matter less than the practice of properly instrumenting your code. That said, here are the tools I&#8217;ve had good experiences with:</p><ul><li><p>OpenTelemetry with Jaeger or Datadog works well for general-purpose tracing and metrics. These are battle-tested tools with good ecosystem support.</p></li><li><p>Langfuse is excellent if you want an open-source option for LLM-specific observability, and LangSmith works well if you&#8217;re using LangChain.</p></li><li><p>Helicone and Portkey are good choices if you want a proxy-based approach that adds observability without changing your code.</p></li></ul><p>The most important thing is to start with something. Imperfect observability is infinitely better than no observability. You can always improve your tooling later, but if you&#8217;re flying blind, you&#8217;re accumulating debugging debt with every deployment.</p><p>I want to be honest:</p><p>Building proper AI observability takes time. Instrumenting your code, setting up dashboards, and configuring alerts.</p><p>It&#8217;s not a trivial amount of work. You might be tempted to skip it, especially when you&#8217;re trying to ship features quickly.</p><p>Don&#8217;t skip it.</p><p>The time you invest in observability pays off exponentially when something goes wrong. And in AI systems, something will go wrong.</p><p>Models behave unexpectedly. Embeddings drift over time. Retrieval quality degrades as your knowledge base grows. These aren&#8217;t edge cases; they&#8217;re inevitable parts of operating AI in production.</p><p>When these problems happen, you have a choice: spend days debugging blind, or spend minutes with good observability.</p><p>The three days I spent debugging that chunking issue? That was time I could have spent building features. That was the time my team spent frustrated and unproductive. That was the time users spent getting wrong answers and losing trust.</p><p>Good observability isn&#8217;t just a technical practice. It&#8217;s a business investment. It&#8217;s the difference between AI systems you can confidently operate and AI systems that feel like ticking time bombs.</p><div><hr></div><p>I hope you learned something today. Spread the love. Share this newsletter with at least two of your friends today.</p><p>If you have questions about the bootcamp, reply to this email. I read everything.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:</p><p><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1. <a href="https://app.masteringbackend.com/">The MB Platform:</a></strong> Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://masteringbackend.com/academy">The MB Academy:</a></strong> The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying.</p><div><hr></div><p><strong>LAST WORD</strong> &#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p>]]></content:encoded></item><item><title><![CDATA[Announcement: AI Backend Engineer Bootcamp]]></title><description><![CDATA[The AI Backend Engineer Bootcamp starts April 1st.]]></description><link>https://kaperskyguru.substack.com/p/announcement-ai-backend-engineer</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/announcement-ai-backend-engineer</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Tue, 10 Mar 2026 09:34:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mR9g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com">Masteringbackend</a> &#8594; An all-in-one platform that helps backend engineers become highly paid backend and AI engineers through a practical, hands-on learning approach.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>I&#8217;ve been working on something for months. Now it&#8217;s ready.</p><p><strong>The AI Backend Engineer Bootcamp starts April 1st.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mR9g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mR9g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png 424w, https://substackcdn.com/image/fetch/$s_!mR9g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png 848w, https://substackcdn.com/image/fetch/$s_!mR9g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png 1272w, https://substackcdn.com/image/fetch/$s_!mR9g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mR9g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1070364,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/190487075?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mR9g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png 424w, https://substackcdn.com/image/fetch/$s_!mR9g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png 848w, https://substackcdn.com/image/fetch/$s_!mR9g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png 1272w, https://substackcdn.com/image/fetch/$s_!mR9g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb84e4f3a-adbd-4dcd-a4c5-3746ec94ba04_3600x2025.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here&#8217;s what I realized: </p><p>Backend engineers are struggling with AI, not because they&#8217;re not smart enough, but because the resources are broken. </p><p>Most AI courses teach you to call an API and print &#8220;Hello World.&#8221; That&#8217;s not engineering.</p><p><strong>This bootcamp is different.</strong></p><p>You&#8217;ll build a complete AI-powered backend system from scratch:</p><ul><li><p><strong>Weeks 1-3:</strong> Backend foundations (auth, databases, APIs, caching, security)</p></li><li><p><strong>Weeks 4-5:</strong> AI infrastructure (vector databases, RAG, agents, cost controls)</p></li><li><p><strong>Week 6:</strong> Live defense &#8212; present your system and defend your architecture</p></li></ul><p>By the end, you&#8217;ll have production-ready code, not tutorial projects.</p><p><strong>Details:</strong></p><ul><li><p>Starts April 1st, 2026</p></li><li><p>6 weeks, 10-15 hours/week</p></li><li><p>30 spots only</p></li></ul><p><strong><a href="https://masteringai.dev">Enroll Now for 38% discount&#8594;</a></strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://masteringai.dev&quot;,&quot;text&quot;:&quot;Enroll Now for 38% discount&#8594;&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://masteringai.dev"><span>Enroll Now for 38% discount&#8594;</span></a></p><div><hr></div><p>Remember to start learning backend engineering from our courses:</p><p><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1. <a href="https://app.masteringbackend.com/">The MB Platform:</a></strong> Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://masteringbackend.com/academy">The MB Academy:</a></strong> The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying.</p><div><hr></div><p><strong>LAST WORD</strong> &#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p>]]></content:encoded></item><item><title><![CDATA[FREE WORKSHOP: AI Beyond the Chatbots: Building Reliable Workflows]]></title><description><![CDATA[I&#8217;m hosting another free workshop &#8212; and this time I&#8217;m bringing reinforcement.]]></description><link>https://kaperskyguru.substack.com/p/free-workshop-ai-beyond-the-chatbots</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/free-workshop-ai-beyond-the-chatbots</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Thu, 05 Mar 2026 09:09:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JGUL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com">Masteringbackend</a> &#8594; An all-in-one platform that helps backend engineers become highly paid backend and AI engineers through a practical, hands-on learning approach.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>I&#8217;m hosting another free workshop &#8212; and this time I&#8217;m bringing reinforcement.</p><p>&#8220;AI Beyond the Chatbots: Building Reliable Workflows&#8221;</p><p>&#128197; Thursday, March 5th</p><p>&#9200; 4:00 PM UTC</p><p>&#128176; Free</p><p>JOIN: <a href="https://luma.com/wfe8hgoh">https://luma.com/wfe8hgoh</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://luma.com/wfe8hgoh" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JGUL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png 424w, https://substackcdn.com/image/fetch/$s_!JGUL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png 848w, https://substackcdn.com/image/fetch/$s_!JGUL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png 1272w, https://substackcdn.com/image/fetch/$s_!JGUL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JGUL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4365622,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://luma.com/wfe8hgoh&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/189974233?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JGUL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png 424w, https://substackcdn.com/image/fetch/$s_!JGUL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png 848w, https://substackcdn.com/image/fetch/$s_!JGUL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png 1272w, https://substackcdn.com/image/fetch/$s_!JGUL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f1b448-72b5-431e-90e7-ba4012bfb4a5_3600x2025.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here&#8217;s the problem this workshop solves:</p><p>Most AI tutorials teach you to build chatbots. Call an API, get a response, done.</p><p>But production AI isn&#8217;t chatbots. Its workflows:</p><ul><li><p>Multi-step processes</p></li><li><p>Data flowing between systems</p></li><li><p>Decisions that need to be reliable</p></li><li><p>Failures that need to be debugged</p></li></ul><p>And workflows break in ways chatbots don&#8217;t.</p><p>Jide will show you the reliability patterns that make AI workflows actually work &#8212; not just in demos, but in production.</p><p>What you&#8217;ll learn:</p><ul><li><p>Why the industry is moving from chatbots to workflows</p></li><li><p>The reliability system every AI workflow needs</p></li><li><p>Live demo of patterns in action</p></li></ul><p>Last workshop had 200+ attendees. This one is capped at 500.</p><p><a href="https://luma.com/wfe8hgoh">https://luma.com/wfe8hgoh</a></p>]]></content:encoded></item><item><title><![CDATA[How would you design a real-time collaborative document editing backend?]]></title><description><![CDATA[How would you design a real-time collaborative document editing backend?]]></description><link>https://kaperskyguru.substack.com/p/how-would-you-design-a-real-time</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-would-you-design-a-real-time</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 21 Feb 2026 08:43:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!strO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com">Masteringbackend</a> &#8594; An all-in-one platform that helps backend engineers become highly paid backend and AI engineers through a practical, hands-on learning approach.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div><hr></div><p>Welcome to another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world system design and interview questions.</p><p><strong>Before we dive in:</strong></p><p>Check the new GitHub for non-developers:</p><p>It's called <a href="http://remix.one?ref=backendweekly">Remix</a>. It allows users to chat with your app, propose changes in plain English, and you decide what to merge.</p><p><strong>How it works:</strong></p><p>Install SDK &#8594; one command setup, works with existing React Native apps</p><p><a href="http://remix.one?ref=backendweekly">Try it here:</a> </p><p>If you have an iPhone, try out Remix on TestFlight</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!omtm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!omtm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg 424w, https://substackcdn.com/image/fetch/$s_!omtm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg 848w, https://substackcdn.com/image/fetch/$s_!omtm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!omtm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!omtm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg" width="1200" height="1402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1402,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!omtm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg 424w, https://substackcdn.com/image/fetch/$s_!omtm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg 848w, https://substackcdn.com/image/fetch/$s_!omtm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!omtm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59cdf3fa-a73a-4698-9d8c-5cccbadf1443_1200x1402.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p><strong>This is the MB Interview Series on Backend Weekly, which airs every Saturday.</strong></p><p><em>In this series, I will guide you through answering common backend engineering interview questions, covering topics such as system design, microservices, API design, and databases.</em></p><p>Let&#8217;s get started with episode 7 (<a href="https://kaperskyguru.substack.com/p/how-would-you-design-a-globally-consistent">Episode 6 Here</a>):</p><div><hr></div><h2><strong>The Interview Scenario</strong></h2><p>You&#8217;re in a backend interview.</p><p>They ask:</p><p>&#8220;How would you design a real-time collaborative document editing backend?&#8221;</p><p>Here&#8217;s how to approach it:</p><div><hr></div><p>To practice this question in real-time, we are building the next Interview Prep Playground.</p><p>Check it out here: <a href="https://masteringbackend.com/interviews">https://masteringbackend.com/interviews</a></p><div><hr></div><p>Now, let&#8217;s start by clarifying what real-time collaboration actually means.</p><h2>Understand the problem</h2><p>Solving any system design question becomes simple when you clearly understand the real problem.</p><p>Real-time collaborative editing is not about text.</p><p>It is about keeping one shared document consistent while many users are editing it at the same time.</p><p>Think about tools like Google Docs or Notion. When two people type at the same time:</p><ul><li><p>Nobody&#8217;s text disappears.</p></li><li><p>The document stays consistent.</p></li><li><p>Updates appear almost instantly.</p></li><li><p>Offline users can reconnect safely.</p></li></ul><p>That is exactly the goal you need to lead with when faced with this question in your next interview.</p><p>When designing this type of system, whether it&#8217;s for a job or discussing it in an interview. </p><p>Your objectives should be to:</p><ul><li><p>Keep the document consistent across all users.</p></li><li><p>Handle concurrent edits safely.</p></li><li><p>Deliver updates in near real-time (under 200ms).</p></li><li><p>Prevent data loss during crashes.</p></li><li><p>Scale to thousands of active documents.</p></li></ul><p>By the way, now that we understand the problem and the objectives that we should focus on. </p><p>The real question becomes:</p><p><strong>How do you synchronize distributed edits safely and efficiently?</strong> </p><h2>The High-Level Architecture</h2><p>Let&#8217;s look at the high-level architecture for this. To design a distributed collaborative backend system like this requires a lot of components and engineering prowess.</p><p>However, we are going to stick with just the high-level architecture to help us with a clear view.</p><p>This is the core of our architecture:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!strO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!strO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png 424w, https://substackcdn.com/image/fetch/$s_!strO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png 848w, https://substackcdn.com/image/fetch/$s_!strO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png 1272w, https://substackcdn.com/image/fetch/$s_!strO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!strO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png" width="1456" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:93532,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/188595071?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!strO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png 424w, https://substackcdn.com/image/fetch/$s_!strO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png 848w, https://substackcdn.com/image/fetch/$s_!strO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png 1272w, https://substackcdn.com/image/fetch/$s_!strO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcab3e762-3b4c-4873-a000-a37451b0a0da_1601x691.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>From the diagram above, you can easily spot the components and how they interact together for a smooth collaborative document editing.</p><p>Below is the interaction that happens:</p><ul><li><p><strong>Client:</strong> Sends edit operations over a persistent WebSocket connection.</p></li><li><p><strong>Collaboration Server:</strong> Receives operations and routes them to the correct document instance.</p></li><li><p><strong>Document State Manager:</strong> Holds in-memory state of active documents and version numbers.</p></li><li><p><strong>Conflict Resolution Engine:</strong> Resolves concurrent edits using OT or CRDT.</p></li><li><p><strong>Database:</strong> Stores document snapshots and an append-only operation log.</p></li></ul><h2>Core Concepts</h2><p>In a collaborative editor, clients do not send the full document every time someone types; rather, they send operations.</p><p>This is an important point to share with your interviewer.</p><p>Here&#8217;s a code example in TypeScript. When a user types &#8220;H&#8221; at position 10, we send:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;typescript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-typescript">interface EditOperation {
  type: "insert" | "delete";
  position: number;
  value?: string;
  length?: number;
  baseVersion: number;
  clientId: string;
}

const edit = {
  type: "insert",
  position: 10,
  value: "H",
  baseVersion: 42,
  clientId: "user-123"
}</code></pre></div><p>If you look closely, you will notice the baseVersion property. This tells the server the document version the user was editing. This is critical for conflict resolution.</p><p>You should also discuss the importance of implementing baseVersion for conflict resolution with your interviewer.</p><p>Let&#8217;s talk about that further:</p><h2>Conflict Resolution Strategy</h2><p>The hardest part of building a collaborative editing backend is &#8220;<strong>Concurrent Edit&#8221;. </strong>I have firsthand experience with this while building our <a href="https://projects.masteringbackend.com?ref=backendweekly">MB Project playground</a>, where users can build a real backend project in real time using our code editor.</p><p>Your concern at this stage is:</p><p>If two users edit at the same time, we must merge their changes safely.</p><p>I&#8217;m going to share two common strategies you can implement to solve this problem, and you can also discuss this with your interviewer to pick the best strategy based on your use case.</p><ul><li><p><strong>Operational Transformation (OT):</strong> This strategy is used by Google Docs. The server transforms incoming operations against already applied operations to maintain consistency.</p></li><li><p><strong>CRDT (Conflict-Free Replicated Data Types):</strong> This strategy is used by systems like Figma. The data structure itself guarantees eventual consistency without central transformation.</p></li></ul><p>For interviews, OT is simpler to explain in centralized systems. </p><h3><strong>Operational Transformation (OT)</strong></h3><p>Operational Transformation (OT) was one of the earliest serious attempts to solve this problem. </p><p>The idea is deceptively elegant because instead of rejecting concurrent operations, you transform them against each other so that they can be applied in any order and still converge to the same final state.</p><p>If two inserts target the same position, you define a deterministic tie-breaking rule using client ID and shift one operation&#8217;s index accordingly. If an insert and delete overlap, you adjust ranges so both operations preserve their original user intent.</p><p>Here is a simplified transformation example:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">function transformInsertInsert(a, b) {
  if (a.position &lt; b.position) return b;

  if (a.position &gt; b.position) {
    return { ...b, position: b.position + a.value.length };
  }

  // deterministic ordering by clientId
  if (a.clientId &lt; b.clientId) {
    return { ...b, position: b.position + a.value.length };
  }

  return b;
}</code></pre></div><p>The code snippet is very simple. I am simply adjusting the position each time an edit occurs so that both users&#8217; intent is preserved.</p><blockquote><p><em>Note that Operational Transformation (OT) may look simple on paper, but the implementation in a real production-ready environment is not simple.</em></p></blockquote><h3><strong>CRDT (Conflict-Free Replicated Data Types)</strong></h3><p>Conflict-Free Replicated Data Types (CRDTs) approach the problem from the opposite direction. </p><p>Instead of rewriting operations dynamically, CRDTs design the data structure itself so that concurrent modifications can be merged deterministically without transformation.</p><p>Instead of saying &#8220;insert at position 5,&#8221; a CRDT says &#8220;insert after element X.&#8221;</p><p>A simplified CRDT insertion might look like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;typescript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-typescript">interface CharNode {
  id: string;        // unique logical timestamp
  value: string;
  next?: CharNode;
}

function insertAfter(targetId, newNode) {
  const target = findNode(targetId);
  newNode.next = target.next;
  target.next = newNode;
}</code></pre></div><p>When two clients insert after the same node concurrently, their unique IDs determine deterministic ordering. Therefore, eliminating the need for transformations like in the case of OT.</p><h2>The Backend Architecture That Actually Works</h2><p>Regardless of OT or CRDT, the surrounding backend architecture tends to converge to a similar shape.</p><p>Clients establish a persistent connection, usually over WebSockets. The server maintains an in-memory representation of active documents, because disk-backed synchronization is far too slow for real-time interaction.</p><p>Every operation is:</p><ol><li><p>Validated.</p></li><li><p>Applied (with transformation or merge).</p></li><li><p>Appended to a durable log.</p></li><li><p>Broadcast to connected collaborators.</p></li></ol><p>The durable log matters more than most teams initially realize. Without it, you cannot recover the document state after a crash, nor can you allow late joiners to reconstruct the document.</p><p>A common approach is event sourcing.</p><p><strong>Below is a simple server implementation:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">interface EditOperation {
  id: string;
  documentId: string;
  revision: number;
  payload: any;
  timestamp: number;
}

wss.on("connection", (socket) =&gt; {
  socket.on("edit", async (operation: EditOperation) =&gt; {
    try {
      const doc = await documentManager.get(operation.documentId);

      const transformed = conflictEngine.transform(doc, operation);

      doc.apply(transformed); // update in-memory state
      await operationLog.save(transformed); // persist

      broadcastToRoom(operation.documentId, transformed);
    } catch (error) {
      console.error("Edit failed", error);
    }
  });
});</code></pre></div><p>This simple implementation demonstrates:</p><ul><li><p>In-memory document state</p></li><li><p>Conflict resolution</p></li><li><p>Durable storage</p></li><li><p>Real-time broadcast</p></li></ul><p>With the core implementation out of the way, let&#8217;s look at some additions that are more challenging but make our collaborative backend system efficient.</p><h2>Offline Editing and Reconciliation</h2><p>Imagine if you&#8217;re working on a document and your internet was disconnected 5 minutes ago. </p><p>What happens to the content you typed within that timeframe?</p><p>That&#8217;s where offline editing and reconciliation come in handy. </p><p>The frontend of your application should store operations locally, and when they reconnect:</p><ul><li><p>They send unsynced operations.</p></li><li><p>Server transforms them against the latest document state using version numbers as key</p></li><li><p>Server applies and rebroadcasts.</p></li></ul><p>This implementation will prevent data loss. More importantly, without version tracking, reconciliation becomes impossible.</p><h2>Scaling the System</h2><p>If you&#8217;re able to build a scalable collaborative editing system. Then scaling it will not be trivial.</p><p>Here&#8217;s how to approach it:</p><ul><li><p>Shard documents by <code>documentId</code> using consistent hashing.</p></li><li><p>Route all edits for a document to the same server node.</p></li><li><p>Keep collaboration servers stateless except for active document memory.</p></li><li><p>Use Redis to share ephemeral state if needed.</p></li><li><p>Snapshot documents every 100 operations to reduce replay time.</p></li></ul><p>When a node crashes:</p><ul><li><p>Reload snapshot.</p></li><li><p>Replay the operation log.</p></li><li><p>Resume service.</p></li></ul><p>This is similar to event sourcing.</p><p><strong>Most importantly, make sure to discuss your fault tolerance, reliability, monitoring, and observability strategies with your interviewer.</strong></p><h2><strong>Final Answer</strong></h2><div class="pullquote"><p>I would design the system using WebSockets for real-time communication, a document state manager that keeps active documents in memory, and an Operational Transformation engine to handle concurrent edits. I would store all operations in an append-only log with periodic snapshots for durability. To scale, I would shard documents using consistent hashing and route all edits for a document to the same node. I would add monitoring around latency, transformation time, and connection stability to ensure reliability. This ensures consistency, scalability, and fault tolerance in a real-time collaborative editing system.</p></div><h2><strong>Final Thoughts</strong></h2><p>Designing a collaborative editor might sound simple, but it is a deep dive into concurrency control, distributed systems, ordering guarantees, and durability.</p><p>Every keystroke becomes a distributed event. Every version number becomes a consistency boundary.</p><p>So next time an interviewer asks, &#8220;How would you design a real-time collaborative document editing backend?&#8221;</p><p>Don&#8217;t just say &#8220;I&#8217;ll use WebSockets.&#8221;</p><p>Walk them through how you&#8217;d handle concurrency, scaling, durability, and failure.</p><p>That&#8217;s how you show senior-level thinking.</p><div><hr></div><h3><strong>Join our AI Engineering Bootcamp</strong></h3><p>To answer the question <strong>&#8220;Will AI Kill Backend Engineers?&#8221;</strong> boils down to the fact that it&#8217;s someone who embraces AI that will replace backend engineers who do not embrace AI.</p><p>If you&#8217;re ready to LEARN AI, EMBRACE AI. We are launching a 6-week AI Bootcamp on &#8220;Become a Production-Ready AI Backend Engineer.&#8221;</p><p>We are backend engineers, and building production-ready systems has been our core skill. Learn exactly how to do the same as an AI Backend Engineer in the AI-first world.</p><p><strong>Join here: <a href="https://masteringai.dev?ref=backendweekly">Become a Production-Ready AI Backend Engineer</a></strong></p><div><hr></div><p>I hope you learned something today: Spread the love. Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this new series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:<br><br><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p>]]></content:encoded></item><item><title><![CDATA[I'm teaching my AI backend framework live (free)]]></title><description><![CDATA[Live workshop Friday &#8212; why most AI tutorials fail and what to do instead]]></description><link>https://kaperskyguru.substack.com/p/im-teaching-my-ai-backend-framework</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/im-teaching-my-ai-backend-framework</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Wed, 11 Feb 2026 09:31:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!szTj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com">Masteringbackend</a> &#8594; An all-in-one platform that helps backend engineers become highly paid backend and AI engineers through a practical, hands-on learning approach.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>On Friday, I&#8217;m hosting a free live workshop:</p><p><strong>&#8220;The 6 Layers Every AI Backend Needs&#8221;</strong></p><p>It&#8217;s the framework I developed after multiple production failures &#8212; runaway agents, hallucinations, cost explosions.</p><p>I&#8217;m teaching it for free.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://luma.com/3jcfa4v0" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!szTj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg 424w, https://substackcdn.com/image/fetch/$s_!szTj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg 848w, https://substackcdn.com/image/fetch/$s_!szTj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!szTj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!szTj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg" width="800" height="419" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:419,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://luma.com/3jcfa4v0&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!szTj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg 424w, https://substackcdn.com/image/fetch/$s_!szTj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg 848w, https://substackcdn.com/image/fetch/$s_!szTj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!szTj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58501280-a49b-4f7c-9be8-9933e9b0ef34_800x419.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Why this matters for backend engineers:</strong></h2><p>Most AI education teaches you to call APIs.</p><p>That&#8217;s maybe 10% of what you need to build AI systems that actually work.</p><p>The other 90%? It&#8217;s backend engineering:</p><ul><li><p>Error handling when the API fails</p></li><li><p>Cost management when usage spikes</p></li><li><p>Observability when outputs are wrong</p></li><li><p>Guardrails when agents misbehave</p></li></ul><p>That&#8217;s what this workshop covers.</p><h2><strong>What you&#8217;ll learn:</strong></h2><p>&#8594; Why AI tutorials fail in production (specific patterns) <br>&#8594; The 6-layer architecture for AI backends <br>&#8594; Live demo: Building a RAG endpoint with proper infrastructure <br>&#8594; Q&amp;A: Your questions answered</p><h2><strong>The details:</strong></h2><ul><li><p><strong>Date:</strong> Friday, 13th February, 2026</p></li><li><p><strong>Time:</strong> 4:00 PM UTC</p></li><li><p><strong>Duration:</strong> 90 minutes</p></li><li><p><strong>Cost:</strong> Free</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://luma.com/3jcfa4v0&quot;,&quot;text&quot;:&quot;Register here&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://luma.com/3jcfa4v0"><span>Register here</span></a></p><p>300+ engineers already registered. If you&#8217;re building with AI (or planning to), this is worth 90 minutes.</p><p>I&#8217;m also launching an AI Backend Engineer Bootcamp in March. The workshop teaches the framework; the bootcamp makes you build it. </p><p>Details at <a href="https://masteringai.dev">masteringai.dev</a> if you&#8217;re curious. But the workshop is valuable on its own, no bootcamp required.</p><div><hr></div><p>I hope you learned something today. Spread the love. Share this newsletter with at least two of your friends today.</p><p>If you have questions about the bootcamp, reply to this email. I read everything.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:</p><p><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1. <a href="https://app.masteringbackend.com/">The MB Platform:</a></strong> Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://masteringbackend.com/academy">The MB Academy:</a></strong> The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying.</p><div><hr></div><p><strong>LAST WORD</strong> &#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p><p></p>]]></content:encoded></item><item><title><![CDATA[Here's exactly what you'll build in 6 weeks]]></title><description><![CDATA[The complete curriculum, defense system, and how to get early access]]></description><link>https://kaperskyguru.substack.com/p/why-most-backend-engineers-will-fail</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/why-most-backend-engineers-will-fail</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 07 Feb 2026 08:00:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tzlx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9b724e-bb27-4b40-967d-6804efa6ea64_392x392.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com">Masteringbackend</a> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Here&#8217;s another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><p><strong>Before we dive in:</strong></p><p>Your AI Agents will be stupid with hallucinated results.<br><br>Except you give it the right data.<br><br>Nimble makes AI Agents smarter by giving them access to retrieve real-time web data in a structured, tabular format.<br><br>With Nimble (@nimble_data), you can:<br><br>- Turn the web into data tables, not just markdown with long text.<br>- Live Web Access, Not Stale Indexes<br>- Access Any Website (Even JS Heavy Ones)<br><br>Imagine you're searching for something like:<br><br>"Which stores have a PS5 in stock within 25 miles right now&#8212;include price, pickup time, and store address from each retailer's site?"<br><br>This is a time-oriented question, and your AI Agents won't give the right result unless you use Nimble:<br><br>Watch this video:<br><br>Here's the documentation to start with: https://docs.nimbleway.com</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;8f96d0b2-d6ce-4b71-9378-81fef615838b&quot;,&quot;duration&quot;:null}"></div><div><hr></div><h2>Why I Built This Bootcamp</h2><p>Let me be direct.</p><p>Most backend engineers are learning AI incorrectly.</p><p>They&#8217;re taking prompt engineering courses. Building ChatGPT wrapper side projects. Adding &#8220;AI/ML&#8221; to their LinkedIn headlines. Watching tutorials on LangChain and vector databases.</p><p>And they still can&#8217;t build AI systems in production.</p><p>I know because I made every one of these mistakes.</p><p>18 months ago, I shipped my first production AI feature. Within two weeks, everything broke.</p><ul><li><p>A runaway agent racked up $400 in API costs overnight</p></li><li><p>A hallucination gave a user incorrect information</p></li><li><p>Memory leaks crashed our service</p></li><li><p>Vector search returned garbage the moment we scaled</p></li></ul><p>Every tutorial I watched was useless. They showed toy demos that fell apart the moment real users touched them.</p><p>So I threw out everything I thought I knew.</p><p>I stopped thinking like an &#8220;AI developer&#8221; and started thinking like a <strong>backend engineer who happens to work with AI</strong>.</p><p>That shift changed everything.</p><h2>The Core Insight</h2><p>Here&#8217;s what most AI education gets wrong:</p><p>They treat AI as a separate discipline.</p><p>It&#8217;s not.</p><p><strong>AI is backend infrastructure.</strong></p><p>RAG pipelines are just retrieval systems. Agents are just workflow orchestrators with LLM calls. Embeddings are mathematical representations that you query like any other data layer. Human-in-the-loop is just approval workflows with AI triggers.</p><p>Once you see AI through this lens, everything clicks.</p><p>The skills that make you a good backend engineer, such as system design, data modeling, error handling, observability, and cost management, are the same skills that make you a good AI backend engineer.</p><p>You just need to know how to apply them.</p><p>That&#8217;s what this bootcamp teaches.</p><div><hr></div><h2>The 6-Week Curriculum</h2><p>Here&#8217;s exactly what you&#8217;ll build each week:</p><h3>Week 1 &#8212; Activation System</h3><p>You ship your first working backend service.</p><p><strong>What you&#8217;ll build:</strong></p><ul><li><p>Authentication system (JWT and session handling)</p></li><li><p>Database schema with proper migrations</p></li><li><p>Input validation layer</p></li><li><p>Environment configuration (dev/staging/prod)</p></li><li><p>Basic API structure with error handling</p></li></ul><p><strong>What you&#8217;ll prove:</strong></p><ul><li><p>You can ship a deployable backend service</p></li><li><p>You understand production-grade foundations</p></li><li><p>You can explain your database design decisions</p></li></ul><h3>Week 2 &#8212; Business Logic Layer</h3><p>You add real business functionality to your system.</p><p><strong>What you&#8217;ll build:</strong></p><ul><li><p>Role-based access control (RBAC)</p></li><li><p>Background job processing</p></li><li><p>Third-party API integrations</p></li><li><p>Payment or workflow logic</p></li><li><p>Multi-entity business rules</p></li></ul><p><strong>What you&#8217;ll prove:</strong></p><ul><li><p>You can build systems that companies actually use</p></li><li><p>You understand authorization vs. authentication</p></li><li><p>You can model real business domains</p></li></ul><p>This is where most &#8220;AI developers&#8221; fall apart. They can copy tutorials, but they can&#8217;t build business systems.</p><h3>Week 3 &#8212; Production Hardening</h3><p>You make your system production-ready.</p><p><strong>What you&#8217;ll build:</strong></p><ul><li><p>Caching strategies (Redis, query caching)</p></li><li><p>Security baselines (input sanitization, rate limiting)</p></li><li><p>Centralized logging</p></li><li><p>Monitoring and alerting setup</p></li><li><p>Performance profiling</p></li></ul><p><strong>What you&#8217;ll prove:</strong></p><ul><li><p>You can ship systems that survive real traffic</p></li><li><p>You understand security at an infrastructure level</p></li><li><p>You can debug production issues</p></li></ul><p>After Week 3, you&#8217;ll have a backend system better than 90% of what junior-to-mid engineers ship. And we haven&#8217;t even touched AI yet.</p><h3>Week 4 &#8212; AI Infrastructure Layer</h3><p>Now we add AI. The right way.</p><p><strong>What you&#8217;ll build:</strong></p><ul><li><p>Vector database integration (Pinecone/Weaviate/Qdrant)</p></li><li><p>Embedding generation and storage</p></li><li><p>RAG pipeline architecture</p></li><li><p>AI agents with proper guardrails</p></li><li><p>Cost tracking and budget controls</p></li></ul><p><strong>What you&#8217;ll prove:</strong></p><ul><li><p>You understand embeddings as math and not magic</p></li><li><p>You can build RAG systems that actually work at scale</p></li><li><p>You can control agent behavior and costs</p></li></ul><p>Here&#8217;s a simple example of what cost control looks like:</p><pre><code>async function executeAgent(task: AgentTask) {
  const budget = await redis.get(`budget:${task.userId}`);
  
  if (budget &amp;&amp; parseInt(budget) &lt;= 0) {
    throw new Error("Budget exceeded for this billing period");
  }
  
  const result = await agent.run(task);
  
  // Track token usage
  await redis.decrby(`budget:${task.userId}`, result.tokensUsed);
  await logUsage(task.userId, result.tokensUsed, result.cost);
  
  return result;
}</code></pre><p>This is the infrastructure the tutorials don&#8217;t show you.</p><h3>Week 5 &#8212; AI Systems Integration</h3><p>You build the systems that make AI safe and maintainable.</p><p><strong>What you&#8217;ll build:</strong></p><ul><li><p>Human-in-the-loop workflows</p></li><li><p>Conversation memory and context management</p></li><li><p>AI-specific observability</p></li><li><p>Hallucination detection and handling</p></li><li><p>Feedback loops for improvement</p></li></ul><p><strong>What you&#8217;ll prove:</strong></p><ul><li><p>You can build AI systems that humans can trust</p></li><li><p>You understand when AI should NOT make decisions</p></li><li><p>You can debug AI failures systematically</p></li></ul><p>Most AI systems fail because engineers treat AI as autonomous. It&#8217;s not.</p><h3>Week 6 &#8212; Defense</h3><p>This is where you prove everything.</p><p><strong>What you&#8217;ll do:</strong></p><ul><li><p>Present your complete system to your cohort</p></li><li><p>Walk through your architecture decisions</p></li><li><p>Explain your trade-offs</p></li><li><p>Debug a scenario I throw at you</p></li><li><p>Answer questions about scaling, security, and failure modes</p></li></ul><p><strong>What you&#8217;ll prove:</strong></p><ul><li><p>You understand what you built &#8212; not just that you built it</p></li><li><p>You can explain technical decisions to senior engineers</p></li><li><p>You can think on your feet when things break</p></li></ul><p>The defense is 60 minutes. I&#8217;ll ask you questions like:</p><ul><li><p>&#8220;Why did you choose this vector database?&#8221;</p></li><li><p>&#8220;What happens if your RAG pipeline returns irrelevant results?&#8221;</p></li><li><p>&#8220;How do you handle it when your agent exceeds budget?&#8221;</p></li><li><p>&#8220;Walk me through how you&#8217;d debug a hallucination in production.&#8221;</p></li></ul><p>You either know your system, or you don&#8217;t. That defense is your proof. Not a PDF certificate.</p><div><hr></div><h2>Why the Defense Matters</h2><p>Most AI education optimizes for engagement. This bootcamp optimizes for transformation.</p><p>You ship code every week. You get feedback on real systems. You defend your architecture to prove you understand it.</p><p><strong>You can&#8217;t fake your way through a defense.</strong></p><p>You either understand your system, or you don&#8217;t. There&#8217;s no middle ground. That&#8217;s why the certificate from this bootcamp actually means something.</p><p>Because it&#8217;s backed by proof.</p><div><hr></div><h2>What Happens Next</h2><ol><li><p><strong>Join the waitlist</strong> at <a href="https://masteringai.dev">masteringai.dev</a></p></li><li><p><strong>This Friday:</strong> Free live workshop where I teach the 6-layer framework for AI backend systems, the same framework the bootcamp is built on</p></li><li><p><strong>After the workshop:</strong> Early bird enrollment opens (waitlist members get first access and priority spots)</p></li></ol><p><strong>50 spots. That&#8217;s it.</strong></p><p>I&#8217;m keeping the first cohort small because everyone gets code reviews, everyone gets feedback, and everyone defends their system.</p><p>That doesn&#8217;t scale to hundreds of students. The waitlist is the only way in. <strong><a href="https://masteringai.dev">masteringai.dev</a></strong></p><div><hr></div><h2>Final Thoughts</h2><p>We are in the AI-first world already.</p><p>Every single platform and system you maintain will either be AI-enhanced or completely revamped based on AI solutions.</p><p>As backend engineers, our role isn&#8217;t just to manage the persistence layer. But, most importantly, to architect the intelligence layer itself.</p><p>Many of us are still focused on optimizing the foundation while the entire skyscraper (the application) is being redesigned around us.</p><p>The engineers who can build AI infrastructure and not just use AI tools will be the ones who command premium salaries and interesting projects.</p><p>This bootcamp doesn&#8217;t promise you a job. It promises you capability.</p><p>What you do with it is up to you.</p><p><strong>Upskill now.</strong></p><p>&#8594; <strong><a href="https://masteringai.dev">masteringai.dev</a></strong></p><div><hr></div><p>I hope you learned something today. Spread the love. Share this newsletter with at least two of your friends today.</p><p>If you have questions about the bootcamp, reply to this email. I read everything.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:</p><p><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1. <a href="https://app.masteringbackend.com/">The MB Platform:</a></strong> Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://masteringbackend.com/academy">The MB Academy:</a></strong> The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying.</p><div><hr></div><p><strong>LAST WORD</strong> &#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Hidden Gap: Backend Engineers vs AI Engineers]]></title><description><![CDATA[The hidden gaps between backend engineers and AI engineers.]]></description><link>https://kaperskyguru.substack.com/p/ive-been-quietly-building-something</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/ive-been-quietly-building-something</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Wed, 04 Feb 2026 14:11:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tzlx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9b724e-bb27-4b40-967d-6804efa6ea64_392x392.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com">Masteringbackend</a> &#8594; An all-in-one platform that helps backend engineers become highly paid backend and AI engineers through a practical, hands-on learning approach.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Here&#8217;s another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><p><strong>Before we dive in:</strong></p><p>I need to tell you about something I&#8217;ve been working on quietly for the past few months. I wasn&#8217;t sure when I&#8217;d share it, but this feels like the right time because many of you have been asking me questions that led me to build this in the first place.</p><p>More on that below.</p><h3>A Conversation That Kept Repeating</h3><p>A few months ago, I had a call with a backend engineer who had been building APIs and services for about three years. He&#8217;s a Solid engineer who knows his way around databases, caching, authentication, and all the core backend stuff.</p><p>He asked me a simple question: </p><p>&#8220;Solomon, how do I actually add AI to my backend systems?&#8221;</p><p>He asked me a simple question: </p><p>&#8220;Solomon, how do I actually add AI to my backend systems?&#8221;</p><p>I started explaining some approaches, and then he stopped me and said something that stuck with me: </p><p>&#8220;I&#8217;ve watched tutorials. I can call the OpenAI API. I can get responses. But when I try to put this into a real system with users, with errors, costs, and things that break at 2 am. I have no idea what I&#8217;m doing.&#8221;</p><p>That conversation didn&#8217;t happen once. It happened again the following week with a different engineer. And then again. And again.</p><p>I started paying closer attention to the questions coming into my inbox, the DMs on Twitter, and the conversations in our community. And I realized there was a pattern that I had somehow missed.</p><p>Backend engineers are not struggling to understand what AI is. They&#8217;re struggling to build production systems that use AI reliably.</p><h2>The Gap I Kept Seeing</h2><p>Let me explain what I mean by that, because it took me a while to fully understand it myself.</p><p>When most backend engineers approach AI, they go through the same journey. They sign up for OpenAI, get an API key, write a few prompts, and see the magic happen. They sent in text-based prompts, and intelligent text responses came out. It feels like the future.</p><p>Then they try to put it into a real application. And suddenly, questions start piling up that no tutorial seems to answer. </p><p>What happens when the API is slow or down entirely, and your users are waiting? How do you know how much each user is costing you in API calls? What do you do when the model returns something completely wrong, and that wrong answer is about to go to a customer? </p><p>How do you change your prompts without breaking things that were working yesterday? How do you even know if something is broken when the outputs are non-deterministic?</p><p>These are not AI questions. These are backend engineering questions. But nobody seems to be teaching them together.</p><p>I went looking for resources that approached AI from a backend engineer&#8217;s perspective and not from a data scientist&#8217;s or  ML researcher&#8217;s perspective. </p><p>I couldn&#8217;t find what I was looking for. Most courses assume you want to fine-tune models, understand transformer architecture, or build ML pipelines. That&#8217;s valuable knowledge, but it&#8217;s not what backend engineers need to ship AI features in production.</p><p>What we need is much more practical. </p><p>We need to know how to build the infrastructure around AI that makes it reliable, observable, cost-controlled, and maintainable. We need to treat AI like we treat any other external dependency in our systems with proper error handling, fallbacks, monitoring, and controls.</p><h2>What I Started Building</h2><p>So I started building something to fill this gap.</p><p>I didn&#8217;t announce it because I wanted to make sure it was actually useful before I talked about it. I&#8217;ve been refining it, testing ideas, talking to engineers, and putting together what I believe is the right approach for backend engineers who want to add AI capabilities to their skill set.</p><p>It&#8217;s not a course where you watch videos and collect a certificate at the end. I&#8217;ve seen too many engineers go through programs like that and come out unable to build anything real. </p><p>They understand concepts but can&#8217;t ship systems.</p><p>Instead, what I&#8217;ve built is a structured program where you actually build things. Every week, you ship code. Real code that does real things. </p><p>By the end, you have a complete AI-powered backend system that you built yourself, and you have to present it and explain your decisions, the same way you would in a senior engineering interview or a design review at work.</p><p>The focus is on what I call <strong>AI infrastructure</strong>. </p><p>Things like building RAG pipelines that actually work at scale, with proper chunking strategies, caching, and citation tracking. Building AI agents that have guardrails so they don&#8217;t run forever and cost you thousands of dollars. </p><p>Implementing <strong>human-in-the-loop systems</strong> where uncertain outputs get routed to a person for review instead of going straight to your users. </p><p>Setting up a proper <strong>Model Control Plane</strong> where you can version your prompts, set up fallbacks when one model fails, and track costs across your entire system.</p><p>These are the patterns that separate a demo from a production system. These are the things I wish someone had taught me when I first started integrating AI into backend services.</p><h2>Why I&#8217;m Telling You Now</h2><p><strong>I&#8217;m sharing this now because enrollment is opening soon, and I wanted you to hear about it first.</strong></p><p>You&#8217;ve been reading <strong>Backend Weekly</strong>, you&#8217;ve been part of this community, and many of you have sent me the exact questions that led me to build this. It felt wrong to announce it publicly without telling you first.</p><p>I also want to be honest about who this is for, because it&#8217;s not for everyone.</p><p>This is for backend engineers who already know how to build things. You should be comfortable with APIs, databases, authentication, and the fundamentals. You don&#8217;t need to be a senior engineer, but you should have shipped real backend code before. </p><p>If you&#8217;re still learning the basics of backend development, this isn&#8217;t the right time. Focus on the fundamentals first, and come back to AI later.</p><p>But if you&#8217;ve been building backends and you&#8217;ve been wondering how to add AI to your skill set in a way that actually makes you more valuable and not just someone who can write prompts, but someone who can architect AI systems, then this might be exactly what you&#8217;ve been looking for.</p><h2>What Happens Next</h2><p>I&#8217;ll be sharing full details later this week, including the curriculum, the schedule, and how to join.</p><p>If you want to make sure you don&#8217;t miss it, just reply to this email with &#8220;INTERESTED&#8221; and I&#8217;ll add you to the early notification list. </p><p>I&#8217;ll also answer any questions you have directly. Just hit reply and ask.</p><p>I&#8217;ve been building this quietly for months, and I&#8217;m genuinely excited to finally share it with you.</p><p>More soon.</p><div><hr></div><p>I hope you found this useful. If you did, share this newsletter with a friend who's been asking how to get into AI as a backend engineer. There's a good chance they're facing the same gap.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:</p><p><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1. <a href="https://app.masteringbackend.com/">The MB Platform:</a></strong> Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://masteringbackend.com/academy">The MB Academy:</a></strong> The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD</strong> &#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome, Solomon (<a href="https://solomoneseme.com">solomoneseme.com</a>)</p>]]></content:encoded></item><item><title><![CDATA[How would you design a globally consistent payment ledger?]]></title><description><![CDATA[How real payment systems stay correct, auditable, and scalable &#8212; even at global scale.]]></description><link>https://kaperskyguru.substack.com/p/how-would-you-design-a-globally-consistent</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-would-you-design-a-globally-consistent</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 03 Jan 2026 15:26:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xviY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com/?ref=backend-weekly&amp;utm_source=newsletter.masteringbackend.com&amp;utm_medium=newsletter&amp;utm_campaign=api-and-api-design-building-enterprise-apis&amp;_bhlid=c51e58543792da03dbe2c0ad9109a794492143ad">Masteringbackend</a><strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach</strong></em>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Welcome to another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world system design and interview questions.</p><p><strong>Before we dive in:</strong></p><p>The enrollment for our upcoming AI Bootcamp has started, and the sit are filling up very fast by those who are ready to embrace the AI-First world.</p><p>We are in the AI-first world already, where every single platform and system you maintain will either be AI-enhanced or completely revamped based on AI solutions. </p><p>As backend engineers, our role isn&#8217;t just to manage the persistence layer. But, most importantly, to architect the intelligence layer itself. </p><p>Many of us are still focused on optimizing the foundation while the entire skyscraper (the application) is being redesigned around us.</p><p><strong>Upskill now.</strong></p><p><a href="https://masteringbackend.com/ai-bootcamp?ref=backendweekly">Join the </a><strong><a href="https://masteringbackend.com/ai-bootcamp?ref=backendweekly">&#8220;Becoming a Production-Ready AI Engineer.&#8221;</a></strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://masteringbackend.com/ai-bootcamp?ref=backendweekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xviY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xviY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xviY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xviY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xviY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://masteringbackend.com/ai-bootcamp?ref=backendweekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xviY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xviY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xviY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xviY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>If you have any questions about the bootcamp. Please send it to hi [at] masteringbackend.com</strong></p><div><hr></div><p><strong>This is the MB Interview Series on Backend Weekly, which airs every Saturday.</strong></p><p><em>In this series, I will guide you through answering common backend engineering interview questions, covering topics such as system design, microservices, API design, and databases.</em></p><p>Let&#8217;s get started with episode 5 (<a href="https://kaperskyguru.substack.com/p/how-would-you-design-a-distributed-aac">Episode 4 Here</a>):</p><div><hr></div><h2><strong>The Interview Scenario</strong></h2><p>You&#8217;re in a backend interview.</p><p>They ask:</p><p><em><strong>&#8220;How would you design a globally consistent payment ledger?&#8221;</strong></em></p><p>Here&#8217;s how to approach it:</p><div><hr></div><p>Lastly, we are building the next Interview Prep Playground targeting backend engineers.</p><p>Join our MB Interview waitlist: <a href="https://tally.so/r/w46glb">https://tally.so/r/w46glb</a></p><div><hr></div><p>Building a FinTech is difficult. In fact, it takes the <strong>utmost attention to detail</strong> to get it right. </p><p>Even when you do:</p><p>It will still undergo thorough rounds of testing and auditing to ensure that money doesn&#8217;t slip away.</p><p>Why?</p><p>Building a FinTech is not about the API, CRUD, databases, and storing balances. The true foundation of a successful FinTech lies between <strong>Trust and Ease of payment.</strong></p><p>It&#8217;s about <strong>correctness under pressure</strong>.</p><p>Payment Ledger is at the heart of every Fintech product because it is the only source of truth for every transaction that happens within the system.</p><p>So how do you approach building a globally consistent payment ledger that&#8217;s distributed, concurrent, serves millions, and still retains its attribute as the only source of truth?</p><p>This is a real challenge. Let&#8217;s dive in:</p><p>As you already know, a strong backend engineer does not jump into coding without clarifying the requirements. Therefore, go through a few rounds of requirement clarification with your interviewer.</p><h2>Clarify the Requirements</h2><p>Before we jump into the architecture diagrams and start implementing. We must clarify what must never break.</p><p>Discuss this with your interviewer.</p><p>For a payment ledger, the core requirements are:</p><ul><li><p><strong>Strong correctness for balances</strong> (no money created or lost)</p></li><li><p><strong>Exactly-once payment application</strong></p></li><li><p><strong>Global availability across regions</strong></p></li><li><p><strong>Auditable, append-only history</strong></p></li><li><p><strong>High write throughput under load</strong></p></li></ul><p>This clarification will give you a clear direction on exactly what to do and what to avoid.</p><h2>What Is a Payment Ledger?</h2><p>A payment ledger is <strong>not</strong> a balance table. It is an <strong>immutable, append-only record of financial events</strong>.</p><p>Balances are <strong>derived data</strong>. The ledger is the <strong>source of truth</strong>.</p><p>Once you internalize this, the rest of the design becomes much clearer.</p><p>A payment ledger stores the record of all the transactions that have ever occurred on your system. </p><p>It doesn&#8217;t matter if it&#8217;s a failed, successful, pending, or reversed transaction; everything is stored exactly once in the ledger.</p><p>Now that the requirements are out of the way, let&#8217;s define the architecture that will solve this problem for us.</p><h2>High-Level Architecture</h2><p>A production-grade payment ledger is typically composed of these core components:</p><ul><li><p><strong>Payment API Layer</strong></p></li><li><p><strong>Ledger Write Service</strong></p></li><li><p><strong>Immutable Transaction Log</strong></p></li><li><p><strong>Balance Service</strong></p></li></ul><p>Each component has a very specific responsibility, and mixing them is how systems fail. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lW97!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lW97!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png 424w, https://substackcdn.com/image/fetch/$s_!lW97!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png 848w, https://substackcdn.com/image/fetch/$s_!lW97!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png 1272w, https://substackcdn.com/image/fetch/$s_!lW97!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lW97!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png" width="1456" height="843" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:843,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:272976,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/183330247?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lW97!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png 424w, https://substackcdn.com/image/fetch/$s_!lW97!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png 848w, https://substackcdn.com/image/fetch/$s_!lW97!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png 1272w, https://substackcdn.com/image/fetch/$s_!lW97!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd97f0973-7e40-4bb0-be6b-90f6ad0278f4_2355x1363.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a good time to discuss with your interviewer about the responsibilities of each component and how they will interact with each other.</p><p>Let&#8217;s break them down:</p><h3><strong>Payment API Layer</strong></h3><p><strong>The gatekeeper of correctness in a payment ledger system. </strong>The <strong>Payment API Layer</strong> is the <strong>front door</strong> to your payment system.</p><p>At a high level, the Payment API Layer does <strong>five critical things</strong>:</p><ol><li><p><strong>Accepts payment intent</strong> from clients</p></li><li><p><strong>Validates requests</strong> (auth, schema, limits, currency rules)</p></li><li><p><strong>Assigns an idempotency key</strong></p></li><li><p><strong>Forwards the request to the Ledger Write Service</strong></p></li><li><p><strong>Returns a commit result (or a safe retry response)</strong></p></li></ol><p>It is intentionally <strong>thin</strong>, <strong>stateless</strong>, and <strong>strict</strong>.</p><p>Here&#8217;s a simple implementation of the payment API layer:</p><pre><code>import { Request, Response } from "express";
import { v4 as uuid } from "uuid";
import { redis } from "./redis";
import { publishToLedger } from "./ledger";

export async function createPayment(req: Request, res: Response) {
  const { fromAccount, toAccount, amount, currency } = req.body;
  const idempotencyKey =
    req.headers["idempotency-key"] || uuid();

  // 1. Validate input
  if (!fromAccount || !toAccount || amount &lt;= 0) {
    return res.status(400).json({ error: "Invalid payment request" });
  }

  // 2. Check idempotency
  const existing = await redis.get(`idem:${idempotencyKey}`);
  if (existing) {
    // Return the previous result safely
    return res.status(200).json(JSON.parse(existing));
  }

  // 3. Build payment intent
  const paymentIntent = {
    intentId: uuid(),
    fromAccount,
    toAccount,
    amount,
    currency,
    idempotencyKey,
    createdAt: Date.now(),
  };

  // 4. Send to ledger writer
  const result = await publishToLedger(paymentIntent);

  // 5. Store result for idempotency replay
  await redis.set(
    `idem:${idempotencyKey}`,
    JSON.stringify(result),
    { EX: 60 * 60 } // 1 hour
  );

  return res.status(201).json(result);
}
</code></pre><h3><strong>Ledger Write Service</strong></h3><p><strong>The single authority that decides what becomes financial truth. </strong>If the Payment API Layer captures <strong>intent</strong>, the <strong>Ledger Write Service</strong> decides <strong>truth</strong>.</p><p>This service is the <strong>only component allowed to write to the ledger</strong>.</p><p>Below are the core responsibilities of the ledger write service:</p><ul><li><p><strong>Enforce Exactly-Once Writes:</strong> It must guarantee that the same transaction is never applied twice, even under retries. This is enforced using Idempotency keys, Transaction uniqueness constraints, and Atomic writes.</p></li><li><p><strong>Append to an Immutable Ledger:</strong> The ledger is Append-only, ordered, never mutated, and never deleted. If something is wrong, you <strong>append a compensating entry</strong>; you don&#8217;t fix history.</p></li><li><p><strong>Guarantee Ordering (Per Account):</strong> Payments touching the same account must be processed <strong>in order</strong>.</p></li><li><p><strong>Emit Deterministic Events: </strong>Every ledger entry must be fully deterministic, Replayable, and Self-contained. This enables recovery, audits, and balance rebuilding.</p></li></ul><p>Below is a simple structure of a Ledger:</p><pre><code>{
  "ledgerEntryId": "ledg_01HXYZ",
  "transactionId": "txn_123",
  "idempotencyKey": "idem_abc",
  "accountId": "acct_789",
  "amount": -5000,
  "currency": "USD",
  "type": "DEBIT",
  "createdAt": 1736359200
}</code></pre><p>Here&#8217;s a sample code snippet implementing a ledger system:</p><pre><code>import { db } from "./db";

export async function appendLedgerEntry(entry: LedgerEntry) {
  return db.transaction(async (tx) =&gt; {
    // 1. Enforce idempotency
    const existing = await tx.ledger.findUnique({
      where: { idempotencyKey: entry.idempotencyKey }
    });

    if (existing) {
      return existing; // exactly-once guarantee
    }

    // 2. Append-only write
    const written = await tx.ledger.create({
      data: {
        transactionId: entry.transactionId,
        idempotencyKey: entry.idempotencyKey,
        accountId: entry.accountId,
        amount: entry.amount,
        currency: entry.currency,
        type: entry.type,
      }
    });

    return written;
  });
}</code></pre><h3><strong>Immutable Transaction Log</strong></h3><p><strong>The permanent record of financial truth. </strong>The <strong>Immutable Transaction Log</strong> is the heart of a payment ledger system.</p><p>It is the place where <strong>every financial event is written once and never changed</strong>.</p><p>If balances are wrong, services crash, or caches are wiped, the immutable log is your only source for all ledger transactions.</p><h4>Why an Immutable Log Is Required?</h4><p>Money systems have non-negotiable constraints:</p><ul><li><p>You must explain every cent</p></li><li><p>You must prove how a balance was formed</p></li><li><p>You must recover from bugs and outages</p></li><li><p>You must support audits years later</p></li></ul><h3><strong>Balance Service</strong></h3><p>If you want to show the user transaction history, querying balances directly from the ledger every time would mean:</p><ul><li><p>Scanning thousands (or millions) of entries</p></li><li><p>High latency</p></li><li><p>Expensive reads</p></li></ul><p>So we <strong>derived</strong> balances or transactions:</p><ul><li><p>Precompute balances per account</p></li><li><p>Store them in optimized tables or caches</p></li><li><p>Rebuild them anytime from the ledger if needed</p></li></ul><p>This gives you speed without sacrificing the correctness of the ledger service.</p><p>Here&#8217;s a sample code snippet:</p><pre><code>type LedgerEntry = {
  accountId: string;
  amount: number; // positive or negative
  offset: number;
};

async function applyLedgerEntry(entry: LedgerEntry) {
  await db.transaction(async (tx) =&gt; {
    const balance = await tx.balance.findUnique({
      where: { accountId: entry.accountId },
    });

    if (balance &amp;&amp; balance.lastOffset &gt;= entry.offset) {
      // Already applied &#8594; idempotent
      return;
    }

    const newBalance = (balance?.balance ?? 0) + entry.amount;

    await tx.balance.upsert({
      where: { accountId: entry.accountId },
      update: {
        balance: newBalance,
        lastOffset: entry.offset,
        updatedAt: new Date(),
      },
      create: {
        accountId: entry.accountId,
        balance: newBalance,
        lastOffset: entry.offset,
        updatedAt: new Date(),
      },
    });
  });
}</code></pre><p>Now that the architecture is out of the way. Let&#8217;s explore the primary flow of a global, consistent ledger system:</p><h2><strong>Primary Payment Flow</strong></h2><p>Here&#8217;s the primary flow you should describe in an interview:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R3c8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R3c8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png 424w, https://substackcdn.com/image/fetch/$s_!R3c8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png 848w, https://substackcdn.com/image/fetch/$s_!R3c8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png 1272w, https://substackcdn.com/image/fetch/$s_!R3c8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R3c8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png" width="1135" height="691" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:691,&quot;width&quot;:1135,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64625,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/183330247?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R3c8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png 424w, https://substackcdn.com/image/fetch/$s_!R3c8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png 848w, https://substackcdn.com/image/fetch/$s_!R3c8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png 1272w, https://substackcdn.com/image/fetch/$s_!R3c8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7b0f80-8e3d-4577-a5a9-3a9753697a78_1135x691.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><strong>Client submits a payment intent</strong></p></li><li><p><strong>API validates the request</strong> and assigns an <strong>idempotency key</strong></p></li><li><p><strong>Ledger service appends the transaction</strong> to the immutable log</p></li><li><p><strong>Balance service updates derived account state</strong></p></li><li><p><strong>Client receives a commit result</strong></p></li></ol><p>Notice something important:</p><div class="pullquote"><p>The balance update is <strong>not</strong> the primary operation. The ledger append is.</p></div><p>Everything we have discussed above is only architecture and your engineering prowess. Now is the time to impress your interviewer by walking them through how you intend to scale and offer reliability throughout your Fintech system</p><h2><strong>Reliability &amp; Correctness Guarantees</strong></h2><p>To impress your interviewer, here&#8217;s what to tell them when you want to discuss reliability and correctness.</p><ul><li><p><strong>Exactly-once semantics:</strong> Achieved via idempotency keys.<strong> </strong>Duplicate requests result in the same committed transaction.</p></li><li><p><strong>Atomic ledger append:</strong> A transaction is either fully recorded or not recorded at all. No partial writes.</p></li><li><p><strong>Deterministic replay:</strong> If balances are corrupted, you can replay the ledger from genesis. This is critical for recovery and audits.</p></li><li><p><strong>Read-after-write consistency (per account):</strong> Once a transaction commits, reads for that account must reflect it.</p></li></ul><p>To successfully build a reliable ledger system, these points are non-negotiable in a Fintech product.</p><h3>Scaling the Ledger</h3><p>Scaling payments is not about adding more databases. It&#8217;s about isolating contention. Discuss this with your interviewer and share your opinion on scaling the system.</p><p>Here are some scaling strategies you can adopt:</p><ul><li><p>Horizontal scaling at the API and ledger writers.</p></li><li><p>Account-based sharding of the ledger.</p></li><li><p>Stateless API tier.</p></li><li><p>Stateful storage isolated behind shard boundaries.</p></li></ul><p>Each account (or group of accounts) maps to a shard, ensuring:</p><ul><li><p>No cross-account locking.</p></li><li><p>Predictable write performance.</p></li><li><p>Clear ownership of the state.</p></li></ul><h3>Observability and Monitoring</h3><p>A payment system that can&#8217;t be observed is a liability.</p><p>You must monitor:</p><ul><li><p>Ledger append latency</p></li><li><p>Ledger write error rates</p></li><li><p>Balance divergence checks</p></li><li><p>Shard lag and replay backlog</p></li><li><p>Structured logs with transaction IDs</p></li></ul><p>Alerts should fire before balances drift, not after customers complain.</p><p>This is also a good time to discuss with your interviewer what metrics are important for them, while you also share your insights on what to monitor.</p><h2><strong>Final Answer</strong></h2><div class="pullquote"><blockquote><p><em>&#8220;I&#8217;d design the system around an append-only ledger with account-level sharding. All payment writes go through an immutable transaction log, while balances are derived separately. Exactly-once semantics are enforced via idempotency keys, and recovery is handled through deterministic replay. The system scales horizontally through shard isolation and remains observable through ledger-level metrics and consistency checks. This allows it to support global payment volume without sacrificing correctness or auditability.&#8221;</em></p></blockquote></div><h2><strong>Final Thoughts</strong></h2><p>Designing a globally consistent payment ledger isn&#8217;t about frameworks.</p><p>It&#8217;s about discipline.</p><ul><li><p>Discipline in separating the source of truth from derived data</p></li><li><p>Discipline in enforcing invariants</p></li><li><p>Discipline in trading convenience for correctness</p></li></ul><p>If you understand this system, you understand real backend engineering.</p><p>And the next time an interviewer asks this question, you won&#8217;t hesitate &#8212; you&#8217;ll walk them through it calmly and confidently.</p><div><hr></div><h3>Join our AI Engineering Bootcamp</h3><p>To answer the question &#8220;Will AI Kill Backend Engineers?&#8221; boils down to the fact that it&#8217;s someone who embraces AI that will replace backend engineers who do not embrace AI.</p><p>If you&#8217;re ready to LEARN AI, EMBRACE AI. We are launching a 6-week AI Bootcamp on &#8220;Become a Production-Ready AI Engineer.&#8221;</p><p>We are backend engineers, and building production-ready systems has been our core skill. Learn exactly how to do the same as an AI Engineer in the AI-first world.</p><p><strong>Join here: <a href="https://masteringbackend.com/ai-bootcamp?ref=backendweekly">Become a Production-Ready AI Engineer</a></strong></p><div><hr></div><p>I hope you learned something today: Spread the love. Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this new series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:<br><br><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p><p></p>]]></content:encoded></item><item><title><![CDATA[Will AI kill Backend Engineers?]]></title><description><![CDATA[Yes. AI will replace backend engineers. Here's how and what to do to stand out.]]></description><link>https://kaperskyguru.substack.com/p/will-ai-kill-backend-engineers</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/will-ai-kill-backend-engineers</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 20 Dec 2025 17:24:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xviY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com/?ref=backend-weekly&amp;utm_source=newsletter.masteringbackend.com&amp;utm_medium=newsletter&amp;utm_campaign=api-and-api-design-building-enterprise-apis&amp;_bhlid=c51e58543792da03dbe2c0ad9109a794492143ad">Masteringbackend</a><strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach</strong></em>.</p><div><hr></div><p>If this newsletter was shared with you, consider subscribing here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://kaperskyguru.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Here&#8217;s another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><p><strong>Before we dive in:</strong></p><p>The enrollment for our upcoming AI Bootcamp has started, and the sits a fill up every fast by those who are ready to embrace the AI-First world.</p><p>We are heading toward an AI-first world, where every single platform and system you maintain will either be AI-enhanced or completely revamped based on AI solutions. </p><p>As backend engineers, our role isn&#8217;t just to manage the persistence layer. </p><p>But, most importantly, to architect the intelligence layer itself. </p><p>Many of us are still focused on optimizing the foundation while the entire skyscraper (the application) is being redesigned around us.</p><p><strong>Upskill now.</strong></p><p><a href="https://masteringbackend.com/ai-bootcamp">Click here to join </a><strong><a href="https://masteringbackend.com/ai-bootcamp?ref=backendweekly">&#8220;Becoming a Production-Ready AI Engineer.&#8221;</a></strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xviY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xviY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xviY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xviY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xviY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xviY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:300343,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/182172745?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xviY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xviY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xviY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xviY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa36c8b55-0595-43a2-9683-36e3e727725e_2160x2160.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>If you have any questions about the bootcamp. Please send it to hi [at] masteringbackend.com</strong></p><div><hr></div><p><strong>This is the AI for Developers Series on Backend Weekly every Monday.</strong></p><p><em>In this series, I will guide you through building with AI and becoming a production-ready backend engineer.</em></p><p>Let&#8217;s get started with episode 1</p><div><hr></div><p>My question is very simple and yet very difficult to answer.</p><p><strong>&#8220;Will AI kill Backend Engineers?&#8221;</strong></p><p>The shortest answer to this question is: <strong>YES</strong></p><p>Why?</p><p>When you think about it from every point of view, AI is here to stay, and the only thing that&#8217;s changing right now is continuous improvement.</p><p>The world will never at any point go back to when AI wasn&#8217;t here. It is continuous improvement that will continue to happen.</p><p>So with these new improvements come new methods and patterns for backend engineers. </p><p>It will get to a stage where AI will be capable of doing a lot more than it&#8217;s currently capable of.</p><p>Therefore, a backend engineer who ignores the following points that I will share in this episode will be replaced.</p><ul><li><p>Embrace Fundamentals</p></li><li><p>Have a plan</p></li><li><p>Learn AI</p></li><li><p>Embrace AI</p></li></ul><h2>Embrace Fundaments</h2><p>AI has disrupted many industries. Infact every industry.</p><p>However, in the software industry, we have seen two ways in which AI has affected. Let me break it down.</p><ul><li><p>Building with AI</p></li><li><p>Building for AI</p></li></ul><h3>Building with AI</h3><p>This first approach is for engineers and businesses who are using AI builder tools to plan, build, test, and scale their software applications.</p><p>For example, using Cursor and all these coding assistants to build and develop applications. </p><p>These categories also include those non-technical developers who just relied on prompts, and the AI completely built out their tools.</p><p>Initially, none of these were in existence. However, this new way is here to stay, and it will continue to evolve. </p><p>It may get to a point where AI can completely execute the complete SDLC without any additional input from humans, except to pass in more prompts.</p><p>We all can agree that it&#8217;s a possible future.</p><p>Now, let&#8217;s talk about the next category.</p><h3>Building for AI</h3><p>This category opens up a new pattern, a new methodology, and ways engineers can utilise their existing skills and push beyond boundaries to stay relevant.</p><p>In this category, you build tools that allow AI to function properly based on business logic.</p><p>For example, if company A is into e-commerce, it can decide to integrate AI into its processes. So they will call an AI Engineer to overhaul the engineering part of the integrations.</p><p>That&#8217;s where you come in:</p><p>You are called to use the fundamentals that you have as a backend software engineer to build for AI. Build tools that AI will use to understand their business, perform their business better, connect their business to other AI businesses, etc.</p><p>Now above all these:</p><p>For any aspect that you belong to, you need the fundamentals of backend engineering at least now that AI still relies heavily on human input.</p><p>So that&#8217;s why I will focus on:</p><p>These are some of the questions I will ask myself.</p><ul><li><p>Which technology is the best for this?</p></li><li><p>How do I break down this task so a junior dev can understand</p></li><li><p>What is the best way to build this</p></li><li><p>Where do I need AI help</p></li><li><p>What database is the best for this kind of product</p></li><li><p>What is this principle of software engineering, and why does it matter?</p></li><li><p>etc</p></li></ul><p>Questions like this help you make informed decisions when you&#8217;re able to articulate and document your processes to AI. You will see the AI doing wonders that you can&#8217;t imagine. </p><p>But first, it has to come from you.</p><h2>Have a plan</h2><p>AI is coming for everybody. Including YOU.</p><p>Don&#8217;t say:</p><p>&#8220;AI can&#8217;t replace or kill backend engineers, because &#8230;&#8230;. (add your reason).&#8221;</p><p>Things change. That&#8217;s the only constant thing.</p><p>Instead, have a plan.</p><p>To create a better plan for yourself, ask yourself these questions and be honest with your answers to yourself.</p><ul><li><p>What do you do now that AI is here? </p></li><li><p>How do I remain relevant?</p></li><li><p>What are others in my industry doing now?</p></li><li><p>What is the new way forward?</p></li><li><p>What do I need to learn to level up?</p></li><li><p>etc</p></li></ul><p>How many of us have seen firsthand how an industry is completely wiped out due to time and change?</p><p>Let me take you back to memory lane.</p><ul><li><p>Remember COBOL?</p></li><li><p>Remember QBasic?</p></li><li><p>Remember Visual Basic (or VB.Net)?</p></li></ul><p>Even in backend engineering, you can easily see lots of changes that have happened, from how we used to build PHP applications back in the day to the Laravel days. </p><p>In frontend engineering, too, building with `index.html`, `style.css`, and `script.js` for Angular, React, Vue, and many others.</p><p>This new AI-first world is the same change that has been happening, so you need to level up and move along side it&#8217;s direction so you don&#8217;t become obsolete.</p><p>Imagine if there&#8217;s someone who still uses procedural PHP and still builds backend systems that way. </p><p>It will still work, but come on, man.</p><p>So now that you have a visual plan and see the relevance in joining the AI race. You need to LEARN AI.</p><h2>Learn AI</h2><p>There&#8217;s no sugar-coating anything. You need to invest money, time, energy, and all you&#8217;ve got to level up.</p><p>Let me tell you why it&#8217;s worth it.</p><p>The money, time, energy, and all you invested to learn backend engineering up to this point. </p><p>Be honest with yourself. Has it paid off or not?</p><p>The majority know that it has paid off, even if your career in backend engineering has not started yet. Deep down, you know you&#8217;re in a career of great importance, and all you just need is one opportunity.</p><p>You need to do the same in this new AI-first world. You need to invest all you&#8217;ve got to climb the ladder again, as others are doing it.</p><p>So put the plan you have into action, start a new course next year, join a bootcamp, invest your time into this new paradigm shift, and wait for the opportunity to present itself.</p><p>If you don&#8217;t know where to start. Here&#8217;s a good roadmap you can follow along:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c5RH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c5RH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png 424w, https://substackcdn.com/image/fetch/$s_!c5RH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png 848w, https://substackcdn.com/image/fetch/$s_!c5RH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png 1272w, https://substackcdn.com/image/fetch/$s_!c5RH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c5RH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png" width="1456" height="1106" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1106,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:426237,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/182172745?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c5RH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png 424w, https://substackcdn.com/image/fetch/$s_!c5RH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png 848w, https://substackcdn.com/image/fetch/$s_!c5RH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png 1272w, https://substackcdn.com/image/fetch/$s_!c5RH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35715b2b-54fb-460f-ba50-f754324d4b7c_2264x1720.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://roadmap.sh/ai-engineer">Click here to access the roadmap.</a></p><p>Also, <a href="https://masteringbackend.com">Mastering Backend</a> is hosting an upcoming webinar next week to answer all your questions and honestly point you to the right direction to <strong><a href="https://masteringbackend.com/ai-bootcamp?ref=backendweekly">become a production-ready AI Engineer</a>.</strong></p><p>Let me add this:</p><p>The problem with most backend engineers is not learning because, as backend engineers, we learn for a living.</p><p>The main problem is embracing it.</p><h2>Embrace AI</h2><p>For some engineers, embracing AI is like allowing someone else to win the &#8220;React is better than Vue&#8221; war.</p><p>It&#8217;s that hard:</p><p>However, to stand out, you need to let in the new normal. If it takes you 2 hours to finish a particular task, with a well-trained colleague, it may take you half of the time.</p><p>AI is your new colleague now; this time, you have power over it. You can describe exactly what you want this new colleague to do, and they will do it exactly.</p><p>You can instruct it on things not do and how to do it, and that&#8217;s exactly what you will have.</p><p>Recently, I was building a fintech product, and I used AI to completely understand financial terms and how to relate or calculate them in backend engineering.</p><p>I&#8217;m a software engineer and not a banker, so how did I build a complete banking application with all the calculations?</p><p>My AI colleague not only has experience in building software, but also has experience in finance. We combined our superpowers and delivered.</p><p>You also experienced things like this at some point.</p><p>Therefore, you need to let go of the old way and embrace this new way of building systems.</p><h2>Summary</h2><p>All these boil down to one thing:</p><p>To answer the question &#8220;Will AI Kill Backend Engineers?&#8221; boils down to the fact that it&#8217;s someone who embraces AI that will replace backend engineers who do not embrace AI.</p><p>If you&#8217;re ready to LEARN AI, EMBRACE AI. We are launching a 6-week AI Bootcamp on &#8220;Become a Production-Ready AI Engineer.&#8221;</p><p>We are backend engineers, and building production-ready systems has been our core skill. Learn exactly how to do the same as an AI Engineer in the AI-first world.</p><p><strong>Join here: <a href="https://masteringbackend.com/ai-bootcamp?ref=backendweekly">Become a Production-Ready AI Engineer</a></strong></p><div><hr></div><p>I hope you learned something today: Spread the love. Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this new series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:<br><br><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p><p></p>]]></content:encoded></item><item><title><![CDATA[How would you design a distributed job queue system]]></title><description><![CDATA[What does it really mean to build a distributed job queueing system? Almost every backend engineer reading this has built a queueing system at some point.]]></description><link>https://kaperskyguru.substack.com/p/how-would-you-design-a-distributed-aac</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-would-you-design-a-distributed-aac</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 06 Dec 2025 15:15:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Fybi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com/?ref=backend-weekly&amp;utm_source=newsletter.masteringbackend.com&amp;utm_medium=newsletter&amp;utm_campaign=api-and-api-design-building-enterprise-apis&amp;_bhlid=c51e58543792da03dbe2c0ad9109a794492143ad">Masteringbackend</a><strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach</strong></em>.</p><div><hr></div><p>If this newsletter was shared with you, consider subscribing here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://kaperskyguru.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Here&#8217;s another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><p><strong>Before we dive in:</strong></p><p>You&#8217;re in a backend interview.<br><br>They ask:<br><br>&#8220;How would you design a modern backend system in this AI era?&#8221;<br><br>Here&#8217;s how to approach it:</p><p>Shortest answer:<br><br>I will use <a href="https://blackbox.ai/?utm_source=twitter&amp;utm_medium=social&amp;utm_campaign=november2025&amp;utm_term=influencer&amp;utm_content=kaperskyguru">Blackbox.ai</a></p><ul><li><p>It generates production-ready code fast.</p></li><li><p>It runs in parallel, where different models work on my tasks, and the best one is picked.</p></li><li><p>It reads my entire codebase and understands it, makes informed decisions, and gets the work done.</p></li><li><p>It implements tasks on a production level with very detailed planning, high precision execution, and a thorough testing phase.</p></li></ul><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;b4aa7e6c-35ec-4a8e-bedf-3a4e17558a6f&quot;,&quot;duration&quot;:null}"></div><p>Check it out here: <a href="https://blackbox.ai/?utm_source=twitter&amp;utm_medium=social&amp;utm_campaign=november2025&amp;utm_term=influencer&amp;utm_content=kaperskyguru">Blackbox AI</a>.</p><div><hr></div><p><strong>This is the MB Interview Series on Backend Weekly every Saturday.</strong></p><p><em>In this series, I will guide you through answering common backend engineering interview questions, covering topics such as system design, microservices, API design, and databases.</em></p><p>Let&#8217;s get started with episode 4 (<a href="https://kaperskyguru.substack.com/p/how-would-you-design-a-rate-limiting">Episode 3 Here</a>):</p><div><hr></div><h2><strong>The Interview Scenario</strong></h2><p>You&#8217;re in a backend interview.</p><p>They ask:</p><p><em><strong>&#8220;How would you design a distributed job queue system that supports retries, prioritization, and horizontal scaling?&#8221;</strong></em></p><p>Here&#8217;s how to approach it:</p><div><hr></div><p>Before we dive in, we are building the next Interview Prep Playground targeting backend engineers.</p><p>Join our MB Interview waitlist: <a href="https://tally.so/r/w46glb">https://tally.so/r/w46glb</a></p><div><hr></div><p>What does it really mean to build a distributed job queueing system? Almost every backend engineer reading this has built a queueing system at some point.</p><p>But building a distributed queueing system that works reliably across all your distributed servers will definitely be worth the challenge.</p><p>The first thing to do when you hit this interview question is to clarify the requirements with your interviewer.</p><p>Let&#8217;s do just that:</p><h2>Clarify Requirements</h2><p>First, let&#8217;s start with what a queueing system is:</p><p>A queueing system is an integral part of a distributed system that helps in managing the flow of tasks between different parts of the application. </p><p>Here&#8217;s a quick illustration:</p><ul><li><p><strong>User A</strong> triggers an action (e.g., uploads a file).</p></li><li><p><strong>Service B</strong> receives the request and creates a background job.</p></li><li><p>The job is pushed into the <strong>Queue</strong>.</p></li><li><p>A <strong>Worker</strong> pulls the next job from the queue.</p></li><li><p>The Worker processes the job and writes the result to the <strong>Database</strong>.</p></li></ul><p>Simple right?</p><p>However, for our interview question, below are some of the requirements. Your distributed queueing system must:</p><ul><li><p>Process background jobs reliably</p></li><li><p>Retry failed jobs with exponential backoff</p></li><li><p>Support job <strong>prioritization</strong> (high/medium/low)</p></li><li><p>Scale horizontally across workers</p></li><li><p>Be idempotent</p></li><li><p>Offer strong observability (retries, failures, latency)</p></li></ul><p>Now that the requirements are out of the way, let&#8217;s define the architecture that will solve this problem for us.</p><h2>Core Components of the System</h2><p>Ideally, a production-ready distributed queue is built from these different parts that come together.</p><ul><li><p>Producers</p></li><li><p>Queue Service</p></li><li><p>Workers/Consumers</p></li><li><p>Storage</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fybi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fybi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png 424w, https://substackcdn.com/image/fetch/$s_!Fybi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png 848w, https://substackcdn.com/image/fetch/$s_!Fybi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png 1272w, https://substackcdn.com/image/fetch/$s_!Fybi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fybi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png" width="1415" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1415,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:111143,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/180232838?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fybi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png 424w, https://substackcdn.com/image/fetch/$s_!Fybi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png 848w, https://substackcdn.com/image/fetch/$s_!Fybi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png 1272w, https://substackcdn.com/image/fetch/$s_!Fybi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dfbad-6758-4bcc-9372-5d12020c2a9f_1415x880.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Producers</h3><p>Producers are APIs or services that submit jobs to the system. These Jobs include metadata such as  <strong>priority, delay, retries, payload, etc.</strong></p><pre><code>import { createClient } from &#8220;redis&#8221;; // could be Kafka or RabbitMQ

const redis = createClient();

export async function enqueueJob(job: any, priority: &#8220;high&#8221; | &#8220;medium&#8221; | &#8220;low&#8221;) {
  const jobData = {
    id: crypto.randomUUID(),
    payload: job,
    retries: 0,
    createdAt: Date.now()
  };

  await redis.lPush(`queue:${priority}`, JSON.stringify(jobData));

  console.log(&#8221;Job added:&#8221;, jobData.id);
}

// Usage

await enqueueJob({ email: &#8220;user@example.com&#8221; }, &#8220;high&#8221;);</code></pre><h3>Queue Service</h3><p>A queue service can be handled by:</p><ul><li><p><strong>Redis</strong> (Lists, Sorted Sets, Streams)</p></li><li><p><strong>RabbitMQ</strong> (priority queues, routing keys)</p></li><li><p><strong>Kafka</strong> (partitioned, scalable pipelines)</p></li></ul><p>It stores enqueued jobs and ensures ordering and durability.</p><h3>Workers/Consumers</h3><p>Workers perform the following functionalities: pull jobs, execute business logic, ACK success or NACK failure, trigger retries, metrics, and logs. Workers must be <strong>stateless</strong> for horizontal scaling.</p><p>Below is a pseudocode of how most queue services are implemented. Just an infinite loop waiting for jobs to process.</p><p>You should replace this code with RabbitMQ or Kafka implementations if you&#8217;re using any of them.</p><pre><code>import { createClient } from &#8220;redis&#8221;;

const redis = createClient();

async function processQueue(priority: string) {
  while (true) {
    const job = await redis.rPop(`queue:${priority}`);
    if (!job) {
      await new Promise(r =&gt; setTimeout(r, 500));
      continue;
    }

    const data = JSON.parse(job);

    try {
      console.log(&#8221;Processing:&#8221;, data.id);

      // Your business logic here
      await sendEmail(data.payload);

      console.log(&#8221;SUCCESS:&#8221;, data.id);
    } catch (err) {
      data.retries += 1;

      if (data.retries &gt; 3) {
        await redis.lPush(&#8221;queue:deadletter&#8221;, JSON.stringify(data));
        console.log(&#8221;Moved to DLQ:&#8221;, data.id);
      } else {
        const delay = 2 ** data.retries * 1000;
        console.log(`Retrying ${data.id} in ${delay}ms`);

        setTimeout(() =&gt; {
          redis.lPush(`queue:${priority}`, JSON.stringify(data));
        }, delay);
      }
    }
  }
}

processQueue(&#8221;high&#8221;);
</code></pre><h3>Storage</h3><p>You can spin up a Postgres, Redis, or Elasticsearch instance to store the results from each job that is performed. You should store the following information for metric and debugging purposes:</p><ul><li><p>Job history</p></li><li><p>Retry attempts</p></li><li><p>Failures</p></li><li><p>Dead-letter queue entries</p></li></ul><p>That&#8217;s all you need to build a distributed queueing system. If you get this right, then your queueing is already 90% done.</p><p>Let me walk you through a simple job lifecycle, which is critical to how you design your queueing system, and answer the interview question properly:</p><h3> The Job Lifecycle</h3><p>Every Job that your queue system processes will pass through this lifecycle, so it&#8217;s important to understand everything that goes into it and where to plug in other services.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xYlH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xYlH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png 424w, https://substackcdn.com/image/fetch/$s_!xYlH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png 848w, https://substackcdn.com/image/fetch/$s_!xYlH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png 1272w, https://substackcdn.com/image/fetch/$s_!xYlH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xYlH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png" width="441" height="631" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:631,&quot;width&quot;:441,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30779,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/180232838?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xYlH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png 424w, https://substackcdn.com/image/fetch/$s_!xYlH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png 848w, https://substackcdn.com/image/fetch/$s_!xYlH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png 1272w, https://substackcdn.com/image/fetch/$s_!xYlH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c097120-cd84-41c4-ad18-d66223fc9c46_441x631.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><strong>A producer submits the job</strong></p></li><li><p><strong>Queue enqueues the job</strong> with metadata</p></li><li><p><strong>A worker/consumer pulls the job</strong></p></li><li><p>A worker <strong>processes</strong> the job</p></li><li><p>Worker <strong>ACKs (Acknowledge)</strong> if successful</p></li><li><p>Worker <strong>requeues</strong> with backoff if it fails</p></li><li><p>Permanently failed jobs go to a <strong>Dead Letter Queue (DLQ)</strong></p></li></ol><p>This simple lifecycle is the backbone of every large-scale queueing system</p><h2>Retries and Backoff in Queues</h2><p>Retries are the backbone of every queueing system. It&#8217;s an important feature to discuss with your interviewer, and you should take great care in implementing a good retry and backoff strategy so that it doesn&#8217;t hammer the system.</p><p>Always use an <strong>exponential backoff</strong>:</p><ul><li><p>2s &#8594; 4s &#8594; 8s &#8594; 16s &#8594; &#8230;</p></li><li><p>Store retry metadata: count, next-run timestamp</p></li><li><p>Log every retry for observability</p></li><li><p>After N failures &#8594; send job to DLQ</p></li></ul><p><strong>DLQs</strong> are essential for debugging failures without blocking the queue.</p><p>From the code snippet above:</p><pre><code>  const delay = 2 ** data.retries * 1000;
  console.log(`Retrying ${data.id} in ${delay}ms`);

  setTimeout(() =&gt; {
    redis.lPush(`queue:${priority}`, JSON.stringify(data));
   }, delay);
</code></pre><p>Then, after N failures:</p><pre><code>  await redis.lPush(&#8221;queue:deadletter&#8221;, JSON.stringify(data));
  console.log(&#8221;Moved to DLQ:&#8221;, data.id);
</code></pre><h3>Reliability &amp; Idempotency</h3><p>Lastly, it&#8217;s important to note that idempotency is important in retries and backoff implementation, which will help with reliability:</p><p>Always store processed job IDs in Redis:</p><pre><code>await redis.set(`processed:${job.id}`, &#8220;1&#8221;, { EX: 3600 });</code></pre><p>Check before processing:</p><pre><code>const exists = await redis.exists(`processed:${job.id}`);
if (exists) return; // skip duplicate</code></pre><h2>Job Prioritization</h2><p>Job prioritization is another important feature of a queueing system that you should discuss with your interviewer. It determines the next job to be processed.</p><p>If you have 3 jobs to be executed in a queue, you can use the default &#8220;First-in, First-Out&#8221; principle of a queue.</p><p>You can also decide to use Priority Queues by assigning an order of priority to each job, and each job will be processed based on its priority in the queue.</p><p>Here&#8217;s an example:</p><pre><code>queue:high
queue:medium
queue:low</code></pre><p>In JavaScript (create a simple array)</p><pre><code>const PRIORITY_ORDER = [&#8221;high&#8221;, &#8220;medium&#8221;, &#8220;low&#8221;];</code></pre><p>Add it to each job when sending it into a queue:</p><pre><code>processQueue(PRIORITY_ORDER[0]);</code></pre><p>Also note that RabbitMQ supports priority queues natively, while Redis uses sorted sets like priority + timestamp.</p><h2>Delayed &amp; Scheduled Jobs</h2><p>Sometimes, your queue needs to delay some jobs intentionally for some tasks, such as:</p><ul><li><p>Sending emails after X minutes</p></li><li><p>Scheduling tasks</p></li><li><p>Backoff logic</p></li></ul><p>In cases like this, use:</p><ul><li><p><strong>Redis Sorted Sets</strong> (score = future timestamp)</p></li><li><p>A scheduler worker pulls when score &lt;= now</p></li></ul><h2>Horizontal Scalability</h2><p>When implementing your workers, always make sure that they are:</p><ul><li><p>stateless</p></li><li><p>All state lives in Redis</p></li><li><p>Any worker can pick any job</p></li><li><p>You can auto-scale using CPU/memory/queue depth</p></li></ul><p>This helps you to scale workers by:</p><ul><li><p>Adding more worker nodes</p></li><li><p>Using Redis Streams or Kafka for partitioning</p></li><li><p>Using <strong>consumer groups</strong> for shared consumption</p></li><li><p>Using distributed locks to prevent double-processing</p></li><li><p>Workloads scale linearly as you add workers</p></li></ul><p>This is how companies like Netflix, Shopify, and Uber scale job systems.</p><h2>Observability &amp; Monitoring</h2><p>You can&#8217;t manage what you can&#8217;t see.</p><p>So discuss this with your interviewer to understand the metric that&#8217;s important to measure or track:</p><p>Below are some metrics to track and the tools to use:</p><p>Track:</p><ul><li><p>Queue sizes</p></li><li><p>Processing latency</p></li><li><p>Retry counts</p></li><li><p>Worker failures</p></li><li><p>DLQ counts</p></li><li><p>Overall throughput</p></li></ul><p>Use tools such as:</p><ul><li><p>Prometheus</p></li><li><p>OpenTelemetry</p></li><li><p>Grafana dashboards</p></li></ul><p>Logs must include the <strong>job ID</strong>, <strong>worker ID</strong>, <strong>payload</strong>, <strong>duration</strong>, and <strong>retry count</strong>.</p><h2><strong>Final Answer</strong></h2><div class="pullquote"><p>&#8220;I&#8217;d build a distributed job queue using Redis Streams or Kafka. Producers submit jobs with metadata like priority, delay, and retries. Workers consume jobs using consumer groups, remain stateless, and retry via exponential backoff. Failed jobs go to a DLQ. The system scales horizontally by adding workers, supports prioritization via multiple queues or weighted consumption, and exposes observability metrics for retries, queue size, and latency.&#8221;</p></div><p>Designing a distributed job queue <em>sounds simple</em> &#8212; but at scale, it becomes a deep dive into:</p><ul><li><p>concurrency</p></li><li><p>ordering</p></li><li><p>durability guarantees</p></li><li><p>backoff scheduling</p></li><li><p>partitioning</p></li><li><p>idempotency</p></li><li><p>observability</p></li><li><p>fault tolerance</p></li></ul><p>These are the kinds of systems that separate <strong>junior</strong> from <strong>senior</strong> backend engineers.</p><p>Master this, and you&#8217;ll stand out in every interview.</p><div><hr></div><p>I hope you learned something today: Spread the love. Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this new series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:<br><br><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 3 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 4000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com/">solomoneseme.com</a>)</p>]]></content:encoded></item><item><title><![CDATA[How would you design an authentication system for a large-scale web application?]]></title><description><![CDATA[A step-by-step blueprint for building secure, scalable authentication systems for modern web apps.]]></description><link>https://kaperskyguru.substack.com/p/how-would-you-design-an-authentication</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-would-you-design-an-authentication</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sun, 23 Nov 2025 12:56:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!lw2B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com/?ref=backend-weekly&amp;utm_source=newsletter.masteringbackend.com&amp;utm_medium=newsletter&amp;utm_campaign=api-and-api-design-building-enterprise-apis&amp;_bhlid=c51e58543792da03dbe2c0ad9109a794492143ad">Masteringbackend</a><strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach</strong></em>.</p><div><hr></div><p>If this newsletter was shared with you, consider subscribing here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://kaperskyguru.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Here&#8217;s another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><p><strong>Before we dive in:</strong></p><p>If you are still stuck in tutorial hell</p><p>We&#8217;ve all been there&#8212;jumping from one Python video to the next, but never building anything real. No portfolio. No confidence. No interviews.</p><p>That&#8217;s why I created this.</p><p><strong>The &#8220;Land Your Dream Python Job&#8221; Challenge</strong><br>A 90-day, 3-phase roadmap that helps you:</p><p>&#9989; Build 30 real-world backend projects in 30 days<br>&#9989; Master DSA for technical interviews<br>&#9989; Get job-ready with resumes, mock interviews &amp; daily job alerts<br>&#9989; And finally... land that backend job</p><p>This is NOT another course. It&#8217;s a challenge. And it works. It&#8217;s beginner-friendly.</p><p>Over <strong>2,000 Python developers</strong> have taken this path&#8212;many are now working at top companies.</p><p><strong>Only 120 slots left at $54 (then goes up to $100)</strong></p><p>Join the challenge &amp; change your future<br>&#128073; <a href="https://python30.masteringbackend.com/?utm_source=newsletter.masteringbackend.com&amp;utm_medium=referral&amp;utm_campaign=api-and-api-design-api-security-part-2">python30.masteringbackend.com</a></p><p>Let&#8217;s get you unstuck.</p><div><hr></div><p><strong>This is the MB Interview Series on Backend Weekly every Saturday.</strong></p><p><em>In this series, I will guide you through answering common backend engineering interview questions, covering topics such as system design, microservices, API design, and databases.</em></p><p>Let&#8217;s get started with episode 4 (<a href="https://kaperskyguru.substack.com/p/how-would-you-design-a-rate-limiting">Episode 3 Here</a>):</p><div><hr></div><h2>The Interview Scenario</h2><p>You&#8217;re in a backend interview.</p><p>They ask:</p><p><strong>&#8220;How would you design an authentication system for a large-scale web application?&#8221;</strong></p><p>Here&#8217;s how you should approach it:</p><div><hr></div><blockquote><p>Before we dive in, we are building the next Interview Prep Playground targeting backend engineers.</p><p>Join our MB Interview waitlist: <a href="https://tally.so/r/w46glb">https://tally.so/r/w46glb</a></p></blockquote><div><hr></div><p>The shortest answer to this question will be:</p><p>Don&#8217;t do it. </p><p>Use an existing solution.</p><p>However, for learning purposes, we will break down the problem of designing a robust authentication system to help us understand how it works.</p><h2>Understanding the problem</h2><p>Building an authentication flow is simple.</p><p>However, designing a secure, scalable, and resilient authentication system for large-scale web applications that millions of users rely on every day.</p><p>That&#8217;s a whole new level, and with this comes different challenges.</p><p>To understand this properly, let&#8217;s detail some of the questions you should ask to give you clarity on how to design the system.</p><ul><li><p>How do users sign up?</p></li><li><p>How are credentials stored?</p></li><li><p>How do you manage sessions at scale?</p></li><li><p>How do mobile apps authenticate differently from browsers?</p></li><li><p>How do you prevent abuse, attacks, replay, token theft, and session hijacking?</p></li><li><p>How do you scale auth to millions of requests per day?</p></li></ul><p>To answer these questions, you must break down the system into these 4 pillars of authentication.</p><ul><li><p><strong>Identity</strong> (who the user is)</p></li><li><p><strong>Authentication</strong> (verify who they say they are)</p></li><li><p><strong>Session management</strong> (keep them logged in)</p></li><li><p><strong>Security</strong> (protect credentials, tokens, systems)</p></li></ul><p>Once you have identified &#8220;who the user is&#8221;, you can decide the logic to verify them and keep them logged in while protecting their credentials, or tokens.</p><p>This is the basic idea of authentication:</p><p>Now, how do you manage this on a large scale? </p><p>Let&#8217;s look at this architecture for a simple, large-scale authentication system.</p><h2>The High-Level Architecture</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lw2B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lw2B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png 424w, https://substackcdn.com/image/fetch/$s_!lw2B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png 848w, https://substackcdn.com/image/fetch/$s_!lw2B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png 1272w, https://substackcdn.com/image/fetch/$s_!lw2B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lw2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png" width="621" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:621,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48882,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/179657224?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lw2B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png 424w, https://substackcdn.com/image/fetch/$s_!lw2B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png 848w, https://substackcdn.com/image/fetch/$s_!lw2B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png 1272w, https://substackcdn.com/image/fetch/$s_!lw2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53c2c61d-dd7d-45fb-aaae-f5c087272dbf_621x761.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A real-world large-scale authentication system is made of:</p><ul><li><p>Authentication Service</p></li><li><p>Identity Store</p></li><li><p>Token System</p></li><li><p>Session Store (optional)</p></li><li><p>API Gateway</p></li><li><p>MFA + Device Management</p></li></ul><p>Let&#8217;s explore each of these layers to give you a clear picture.</p><h3>Authentication Service</h3><p>The authentication service is where your logic happens. It is a backend service written in any server-side language. It handles the business logic of authenticating a user. This service must be <strong>stateless</strong>, horizontally scalable, and secured behind a gateway.</p><p>It handles the following, depending on the use case of the business:</p><ul><li><p>Login</p></li><li><p>Signup</p></li><li><p>Token issuance</p></li><li><p>MFA</p></li><li><p>Refresh tokens</p></li><li><p>Password resets</p></li><li><p>Device tracking</p></li></ul><p>Here&#8217;s a simple code snippet for logging in a user:</p><pre><code>export const handler: Handlers[&#8221;Login&#8221;] = async (req, { emit, logger }) =&gt; {
  try {
    const validatedData = loginSchema.parse(req.body);

    // Find user
    const user = await prisma.user.findFirst({
      where: { email: { equals: validatedData.email } },
    });
    if (!user) throw new Error(&#8221;authentication failed&#8221;);

    const passwordMatch = await comparePassword({
      password: validatedData.password,
      hashed: user.password,
    });
    if (!passwordMatch) throw new Error(&#8221;authentication failed&#8221;);

    const token = createToken({ userId: user.id });

    if (logger) {
      logger.info(&#8221;User logged in&#8221;, {
        petId: user.id,
        name: user.name,
        createdAt: user.createdAt,
      });
    }

    if (emit) {
      await (emit as any)({
        topic: &#8220;user.loggedin&#8221;,
        data: user,
      });
    }

    return {
      status: 201,
      body: {
        message: &#8220;User logged in successfully&#8221;,
        success: true,
        data: {
          user,
          token,
        },
      },
    };
  } catch (error) {
    if (error instanceof z.ZodError) {
      // Return validation errors with proper HTTP status
      return {
        status: 400,
        body: {
          message: &#8220;Validation error&#8221;,
          errors: error.errors,
        },
      };
    }

    // Handle unexpected errors
    if (logger) {
      logger.error(&#8221;User creation failed&#8221;, {
        error: error instanceof Error ? error.message : &#8220;Unknown error&#8221;,
      });
    }

    return {
      status: 500,
      body: {
        message:
          error instanceof Error ? error.message : &#8220;Internal server error&#8221;,
      },
    };
  }
};</code></pre><h3>Identity Store</h3><p>An identity store is very important as it represents your database or any storage system you use to persist user data or logged-in information.</p><p>This is where your user data lives:</p><ul><li><p>Email</p></li><li><p>Hashed passwords: Must be hashed using any of these: Argon2, bcrypt, or scrypt</p></li><li><p>MFA secrets</p></li><li><p>Device fingerprints</p></li><li><p>OAuth identities</p></li></ul><p>From our code snippet below, you can see that we connect to a database using Prisma ORM to retrieve and compare user data when trying to log in.</p><pre><code>// PRISMA ORM
    
const user = await prisma.user.findFirst({
      where: { email: { equals: validatedData.email } }, 
});</code></pre><p>You can use any database or data store available to you or agreed upon by the interviewer. </p><p>However, for recommendation, here are some data stores you can discuss with your interviewer to choose which one is better:</p><ul><li><p>PostgreSQL (strong consistency)</p></li><li><p>DynamoDB (regional scaling)</p></li><li><p>LDAP (enterprise)</p></li><li><p>Firebase Auth (managed)</p></li></ul><h3>Token System</h3><p>Your token system can be a simple function that generates a login token for the user on the go, or a different system that creates, manages user tokens, such as refresh tokens, expired, etc.</p><p>There are common services or libraries for generating and managing user tokens:</p><ul><li><p>JWT (stateless, used by most modern APIs)</p></li><li><p>Opaque tokens + DB/session store</p></li><li><p>OAuth2 access + refresh tokens</p></li></ul><p>The idea of a token system is to have a seamless functionality that manages everything related to tokens for users.</p><p>Here&#8217;s a simple snippet that uses JWT to generate user tokens:</p><pre><code>export function setJWTOAuth(user: any): any {
  const token = jwt.sign({ userId: user?.id }, environment.auth.tokenSecret, {
    expiresIn: &#8220;30 days&#8221;,
  });
  const thirtyDays = 30 * 24 * 60 * 60 * 1000;

  const isProd = ![&#8221;LOCAL&#8221;, &#8220;DEVELOP&#8221;].includes(environment.context);

  const options: Option = {
    httpOnly: true,
    expires: new Date(Date.now() + thirtyDays),
    sameSite: isProd ? &#8220;none&#8221; : false,
    secure: isProd,
  };

  if (isProd) options.domain = &#8220;.example.com&#8221;;

  return {
    options,
    token,
  };
}</code></pre><p>Here, we generated a token using JWT, stored it in the user&#8217;s cookie, and also returned it to the user as response data to the user.</p><h3>Session Store</h3><p>The session store is optional since your authentication flow should be stateless, meaning your backend should hold no state of a logged-in user; that&#8217;s the essence of using a token-based system, especially JWT. </p><p>However, in some specific cases, your backend might actually need to save state, or if you agree with your interviewer to design your authentication system with session state in mind.</p><p>Here are some use cases where implementing a session store might be needed. If your app needs:</p><ul><li><p>Server invalidation (Logging out a user from the server, etc)</p></li><li><p>Instantly log out everywhere</p></li><li><p>Revocation lists</p></li></ul><h3>API Gateway</h3><p>For a large-scale and scalable authentication system, an API gateway is important, and it stands as a mediator between your services while performing many important tasks, such as:</p><ul><li><p>Token validation</p></li><li><p>Rate limiting</p></li><li><p>IP allow/deny</p></li><li><p>Bot filtering</p></li><li><p>Throttling</p></li></ul><p>This is where authentication and authorization happen before sending the user to the right service.</p><h3>MFA + Device Management</h3><p>In some applications, Multi-Factor Authentication is optional, so you need to discuss with your interviewer if this feature is needed.</p><p>However, as a general recommendation, when building a large-scale enterprise application, MFA is not optional and needs to be implemented.</p><p>Device management on the other end is about storing the device information of your user for different purposes. </p><p>A very good use case could be whether you want to allow multiple devices to be connected to your application or not.</p><div><hr></div><p>Next, let&#8217;s look at this visual representation of an authentication flow:</p><h2>How Authentication Flows</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y1g9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y1g9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png 424w, https://substackcdn.com/image/fetch/$s_!y1g9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png 848w, https://substackcdn.com/image/fetch/$s_!y1g9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png 1272w, https://substackcdn.com/image/fetch/$s_!y1g9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y1g9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png" width="761" height="194" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47de2a00-23d1-4251-ae57-987267181e14_761x194.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:194,&quot;width&quot;:761,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23299,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/179657224?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y1g9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png 424w, https://substackcdn.com/image/fetch/$s_!y1g9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png 848w, https://substackcdn.com/image/fetch/$s_!y1g9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png 1272w, https://substackcdn.com/image/fetch/$s_!y1g9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47de2a00-23d1-4251-ae57-987267181e14_761x194.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The flow is simple:</p><ul><li><p>Signup Flow</p></li><li><p>Login Flow</p></li><li><p>Refresh Token Flow</p></li><li><p>Logout Flow</p></li></ul><h3>Signup Flow</h3><p>When a user sends a request to the backend with the valid information, your backend service follows these steps:</p><p><strong>Steps:</strong></p><ol><li><p>Validate inputs</p></li><li><p>Hash password with Argon2</p></li><li><p>Create user record</p></li><li><p>Send verification email (Queue)</p></li><li><p>Return success</p></li></ol><h3>Login Flow</h3><p>When a user sends a login request to the backend with the valid information, your backend service follows these steps:</p><p><strong>Steps:</strong></p><ul><li><p>Fetch user</p></li><li><p>Compare hash</p></li><li><p>Generate access token (short-lived)</p></li><li><p>Generate refresh token (long-lived)</p></li><li><p>Store refresh token in Redis/DB (optional)</p></li><li><p>Return tokens and user metadata</p></li></ul><h3>Refresh Token Flow</h3><p>When a user's access token expires, a refresh token request is sent to the backend with valid information. Your backend service follows these steps:</p><p>Authentication service checks:</p><ul><li><p>Is this refresh token valid?</p></li><li><p>Is it revoked?</p></li><li><p>Is the device recognized?</p></li></ul><p>If valid:</p><ul><li><p>Issue a new access token</p></li><li><p>Issue a new refresh token (rotation)</p></li><li><p>Revoke the old one</p></li></ul><h3>Logout Flow</h3><p>When a user sends a logout request to the backend with the valid information, your backend service follows these steps:</p><p>If using server-side sessions:</p><ul><li><p>Delete token from Redis</p></li></ul><p>If using JWT:</p><ul><li><p>Invalidate the token immediately</p></li><li><p>Add token to a <strong>revocation list</strong> (short TTL)</p></li><li><p>Or rely on short-lived tokens</p></li></ul><div><hr></div><p>Next, let&#8217;s explore the different authentication methods and understand which one to use for different cases.</p><h2>What Authentication Method Should You Use?</h2><p>There are so many authentication methods and strategies to choose from. However, here are the common strategies:</p><h3>JWT (JSON Web Tokens): </h3><p>The JWT strategy is the most common due to its stateless nature. You can verify a user&#8217;s token without querying the database, making it great for distributed, multi-region services.</p><p>Here&#8217;s a simple code snippet to generate a JWT token:</p><pre><code>  const token = jwt.sign({ userId: user?.id }, TOKENSECRET, {
    expiresIn: &#8220;1 day&#8221;,
  });</code></pre><h3>Opaque Tokens and DB Store: </h3><p>This is a simple, secure, and server-controlled authentication strategy. </p><p>Opaque tokens are <strong>random, meaningless strings</strong> issued by the server to represent a user&#8217;s session. Unlike JWTs, they <strong>contain zero user data</strong>. </p><p>Their only purpose is to act as a key to look up session information stored on the backend.</p><p>Here&#8217;s a simple code snippet to generate a strong opaque token:</p><pre><code>export const createToken = (data: Partial&lt;TokenData&gt;): string =&gt; {
  const header = { alg: &#8220;HS256&#8221;, typ: &#8220;OPAQUE&#8221; };

  const payload = {
    iat: Math.floor(Date.now() / 1000),
    exp: Math.floor(Date.now() / 1000) + 3600,
  };

  const encodedHeader = Buffer.from(JSON.stringify(header)).toString(
    &#8220;base64url&#8221;
  );
  const encodedPayload = Buffer.from(JSON.stringify(payload)).toString(
    &#8220;base64url&#8221;
  );
  const signature = Buffer.from(process.env.SIGNATURE!).toString(&#8221;base64url&#8221;);

  return `${encodedHeader}.${encodedPayload}.${signature}`;
};</code></pre><h3>Session Cookies (Web): </h3><p>The session cookie is a battle-tested and classic authentication model for web applications. If you&#8217;re building a web-based application. I will always recommend session cookies.</p><p>Session cookies rely on a simple idea:</p><blockquote><p>The server creates a session and stores the data. The browser holds only a tiny &#8220;session ID&#8221; cookie to reference that session.</p></blockquote><p>This is the oldest and still the most secure method for traditional web authentication.</p><p>One good thing about this method is that you can generate the token using any of the above methods, you can decide to store it in Redis, your database, or make it stateless.</p><p>Here&#8217;s a simple code snippet combining JWT tokens with session cookies:</p><pre><code>export function setJWTOAuth(user: any): any {
  const token = jwt.sign({ userId: user?.id }, TOKENSECRET, {
    expiresIn: &#8220;30 days&#8221;,
  });

  const thirtyDays = 30 * 24 * 60 * 60 * 1000;

  const isProd = ![&#8221;LOCAL&#8221;, &#8220;DEVELOP&#8221;].includes(environment.context);

  const options: Option = {
    httpOnly: true,
    expires: new Date(Date.now() + thirtyDays),
    sameSite: isProd ? &#8220;none&#8221; : false,
    secure: isProd,
  };

  if (isProd) options.domain = &#8220;.example.com&#8221;;

  return {
    options,
    token,
  };
}</code></pre><h3>OAuth2 / OpenID Connect</h3><p>OAuths are mostly implemented to allow integrations with 3rd party systems. For example, allowing users to log in/register using Google, GitHub, Twitter (X), etc. This can be done properly by following the documentation of specific integration and learning about OAuth to create one.</p><div><hr></div><p>Next, let&#8217;s examine how to scale an authentication service:</p><h2>Scaling Authentication</h2><p>To scale your system to millions of users, you must think beyond tokens. You must think of the following points and implement them properly:</p><ul><li><p><strong>Distributed Token Validation:</strong> Use a distributed-enabled token strategy like JWT, which allows you to validate the token locally within a gateway without hitting the database.</p></li><li><p><strong>Horizontal Scaling:</strong> Design a stateless authentication service to scale using Kubernetes or containers.</p></li><li><p><strong>Rate Limiting &amp; Abuse Protection:</strong> Add some layers of protection, such as rate-limiting, to prevent attacks such as credential stuffing, brute force, token replay, and bot attacks.</p></li><li><p><strong>Monitoring &amp; Metrics:</strong> Add proper monitoring and track metrics such as Login success rates, Login failures, Suspicious IPs, Token refresh volume, Top failing passwords, MFA usage, and Bot patterns, and set alerts on the following: Sudden login failure spike, Refresh token abuse, key-signing failures, and JWT verification failures</p></li></ul><h2>Final Answer</h2><p>Here&#8217;s exactly how to put your answer forward to the interviewer:</p><div class="pullquote"><p>&#8220;I&#8217;d design auth with a secure user store, token-based sessions (JWT or opaque tokens), refresh token flow, MFA, and centralized logging. At scale, I&#8217;d add Redis for session lookups and rate limiting for login endpoints.&#8221;</p></div><p>Designing a scalable authentication system shows your engineering prowess. It&#8217;s the beginning of understanding how complex and scalable backend systems work.</p><p>Every choice you make affects the user right where they want to access your main product. </p><p>Therefore, this is where real engineering starts.</p><p>So the next time an interviewer asks, <strong>&#8220;How would you design an authentication system for a large-scale web application?&#8221;</strong> don&#8217;t just say, &#8220;<strong>I&#8217;ll use an existing solution.</strong>&#8221;</p><p>Walk them through your thinking. </p><p>Show them how you&#8217;d keep things fair, fast, and scalable even when millions of requests hit your system. That&#8217;s the part that shows you truly understand how real-world backend systems behave.</p><div><hr></div><p>I hope you learned something today: Spread the love. Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this new series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:<br><br><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 4 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 1000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com">solomoneseme.com</a>)</p>]]></content:encoded></item><item><title><![CDATA[How would you design a Rate-limiting system for APIs at scale?]]></title><description><![CDATA[How to design a rate-limiting system that protects your APIs at scale &#8212; and explain it confidently in your next backend interview.]]></description><link>https://kaperskyguru.substack.com/p/how-would-you-design-a-rate-limiting</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-would-you-design-a-rate-limiting</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 15 Nov 2025 17:04:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tzlx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9b724e-bb27-4b40-967d-6804efa6ea64_392x392.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com/?ref=backend-weekly&amp;utm_source=newsletter.masteringbackend.com&amp;utm_medium=newsletter&amp;utm_campaign=api-and-api-design-building-enterprise-apis&amp;_bhlid=c51e58543792da03dbe2c0ad9109a794492143ad">Masteringbackend</a><strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach</strong></em>.</p><div><hr></div><p>If this newsletter was shared with you, consider subscribing here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://kaperskyguru.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Here&#8217;s another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><p><strong>Before we dive in:</strong></p><p>If you are still stuck in tutorial hell</p><p>We&#8217;ve all been there&#8212;jumping from one Python video to the next, but never building anything real. No portfolio. No confidence. No interviews.</p><p>That&#8217;s why I created this.</p><p><strong>The &#8220;Land Your Dream Python Job&#8221; Challenge</strong><br>A 90-day, 3-phase roadmap that helps you:</p><p>&#9989; Build 30 real-world backend projects in 30 days<br>&#9989; Master DSA for technical interviews<br>&#9989; Get job-ready with resumes, mock interviews &amp; daily job alerts<br>&#9989; And finally... land that backend job</p><p>This is NOT another course. It&#8217;s a challenge. And it works. It&#8217;s beginner-friendly.</p><p>Over <strong>2,000 Python developers</strong> have taken this path&#8212;many are now working at top companies.</p><p><strong>Only 120 slots left at $54 (then goes up to $100)</strong></p><p>Join the challenge &amp; change your future<br>&#128073; <a href="https://python30.masteringbackend.com/?utm_source=newsletter.masteringbackend.com&amp;utm_medium=referral&amp;utm_campaign=api-and-api-design-api-security-part-2">python30.masteringbackend.com</a></p><p>Let&#8217;s get you unstuck.</p><div><hr></div><p><strong>This is the MB Interview Series on Backend Weekly every Saturday.</strong></p><p><em>In this series, I will walk you through how to answer common backend engineering interview questions, covering topics such as system design, microservices, API design, and databases.</em></p><p>Let&#8217;s get started with episode 3 (<a href="https://kaperskyguru.substack.com/p/how-would-you-design-a-distributed-f91">Episode 2 Here</a>):</p><div><hr></div><h2>The Interview Scenario</h2><p>You&#8217;re in a backend interview.</p><p>They ask:</p><p><strong>&#8220;How would you design a rate-limiting system for APIs at scale?&#8221;</strong></p><p>Here&#8217;s how you should approach it:</p><div><hr></div><p>Before we dive in, we are building the next Interview Prep Playground targeting backend engineers.</p><p>Join our MB Interview waitlist: <a href="https://tally.so/r/w46glb">https://tally.so/r/w46glb</a></p><div><hr></div><p>Make sure you have a solid understanding of the problem first before attempting any solution.</p><h2>Understand the problem</h2><p>Solving any problem is very simple, depending on your understanding of the problem.</p><p>Rate limiting is simple:</p><p>If you want to protect your backend systems from overload, ensure fair use across tenants, and provide predictable SLA behavior, then you need Rate-Limiting.</p><p>Rate-limiting is the process of limiting the number of requests a client (user, API key, IP, or tenant) can make to your API within a time window. </p><p>Let&#8217;s say within 1 minute or 2 seconds. </p><p>It prevents abuse, protects downstream services (DBs, 3rd-party APIs), and lets you enforce product tiers.</p><p>Below are typical goals for a production-ready rate-limiter:</p><ul><li><p>Protect system capacity and reduce overload.</p></li><li><p>Provide predictable latency and QoS (Quality of Service).</p></li><li><p>Enforce per-user, per-API-key, per-IP, per-route, and per-plan limits.</p></li><li><p>Support bursts while enforcing long-term fairness.</p></li><li><p>Work at millions of requests/sec across regions.</p></li></ul><p>The question in your mind probably is:</p><p>Then, how do you design a rate-limiting system for APIs at scale when many services are running in a microservice architecture?</p><p>Let&#8217;s look at this architecture for a simple distributed rate-limiting system.</p><h2>The High-Level Architecture</h2><p>At a high level, the core pieces are:</p><ul><li><p><strong>Generate App Server:</strong> Your backend services (Express, Go, Java, etc.). Requests land here after passing through the edge/gateway. Services might be regional.</p></li><li><p><strong>API Gateway / Edge:</strong> Primary place to enforce low-latency rate checks (Envoy, Kong, AWS API Gateway, Fastly). This is the first defensive layer.</p></li><li><p><strong>Distributed Store:</strong> Fast store for counters/tokens (Redis cluster, in-memory local caches, or an external quota service).</p></li><li><p><strong>Central Quota Service (optional):</strong> For billing/long-term quotas and complex policy enforcement.</p></li><li><p><strong>Monitoring/Policy Store:</strong> Where limits &amp; plans live (DB/Config service, fetched by gateway).</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!guCC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!guCC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png 424w, https://substackcdn.com/image/fetch/$s_!guCC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png 848w, https://substackcdn.com/image/fetch/$s_!guCC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png 1272w, https://substackcdn.com/image/fetch/$s_!guCC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!guCC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png" width="802" height="201" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:201,&quot;width&quot;:802,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27606,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/178959552?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!guCC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png 424w, https://substackcdn.com/image/fetch/$s_!guCC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png 848w, https://substackcdn.com/image/fetch/$s_!guCC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png 1272w, https://substackcdn.com/image/fetch/$s_!guCC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f8531a7-1197-41a5-ab9e-a07cd062975c_802x201.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Edge consults the distributed store (or local cache + sync) to decide allow/deny.</p><h2>Rate-Limiting Algorithms</h2><p>Below are the core strategies used across the industry. You can pick one and discuss it extensively with your interviewer.</p><h3>Fixed Window</h3><p>In a fixed-window algorithm, the system resets the request counter at fixed time intervals (such as every minute), making it simple to implement but occasionally allowing request bursts to slip through right as a window resets.</p><pre><code>if requests[current_window] &lt; limit: allow()</code></pre><h3><strong>Sliding Window Log</strong></h3><p>With sliding-window log rate limiting, the system stores a timestamp for every request and calculates allowed requests by examining only the ones within the current time window, giving it excellent accuracy but making it expensive and memory-heavy at large scale.</p><pre><code>log = log.filter(last_60s); if log.size &lt; limit: log.add(now); allow()</code></pre><h3><strong>Sliding Window Counter</strong></h3><p>The sliding-window counter combines the simplicity of the fixed window with some of the accuracy of the log approach by blending counts from the current window with a proportion of the previous one, making it smoother during bursts and efficient enough for large-scale systems.</p><pre><code>count = curr_count + (prev_count * overlap_ratio); if count &lt; limit: allow()</code></pre><p><strong>Token Bucket</strong><br>In a token-bucket algorithm, each request consumes a token from the bucket while tokens continuously refill at a fixed rate, allowing the system to enforce steady traffic limits while still permitting short, controlled bursts.</p><pre><code>tokens = min(max_tokens, tokens + refill_rate * dt); if tokens &gt; 0: tokens-- &amp;&amp; allow()</code></pre><p><strong>Leaky Bucket</strong><br>The leaky-bucket algorithm forces requests to leave the system at a constant, fixed rate, regardless of how fast they arrive, making it useful for smoothing or shaping uneven traffic into a predictable, stable flow.</p><pre><code>if queue.size &lt; bucket_size: queue.push(req); process_at_fixed_rate()</code></pre><p>I recommend the <strong>Sliding Window Counter algorithm</strong> for simplicity and fairness, or the <strong>token bucket</strong> as a practical choice for gateways that need burst tolerance and predictable RPS.</p><h3>Designing a Distributed Rate-Limiting System</h3><p>When designing a distributed rate-limiting system. You need to know where to enforce limits, what limits to enforce, and where not to enforce limits.</p><ul><li><p><strong>Edge/API Gateway:</strong> This is the primary point to place rate limiting. It is closer to the client and adds the lowest latency.</p></li><li><p><strong>Sidecar / Service mesh (intra-cluster):</strong> For fine-grained control per service.</p></li><li><p><strong>Application layer (fallback):</strong> Add extra business-rule limits or extra checks.</p></li><li><p><strong>Central quota/checker</strong>: Synchronous for billing limits OR asynchronous reconciliation for monthly quotas.</p></li><li><p><strong>Client SDK (soft):</strong> Client-side backoff to reduce noise.</p></li></ul><blockquote><p><strong>Note:</strong> Enforce simple decisions at the gateway (fast) with a Redis (or local) token bucket for atomic checks. Use a central quota service only where strong consistency/billing is required.</p></blockquote><h4>Distributed Design Considerations</h4><p>Now, let&#8217;s look at some key considerations to consider when building a distributed rate-limiting architecture.</p><ul><li><p><strong>Single Redis vs Clustered Redis:</strong> At scale, use a Redis cluster or sharded approach with keys sharded to distribute load.</p></li><li><p><strong>Consistent hashing:</strong> When sharding gateways or Redis nodes, consistent hashing reduces reshuffling when nodes change.</p></li><li><p><strong>Local token buckets:</strong> Gateways can use a local, in-process bucket and periodically sync to Redis (optimistic allows lower latency).</p></li><li><p><strong>Multi-region:</strong> Prefer to region-local enforcement (low latency), accept eventual consistency for cross-region quotas. For strict global quotas, route checks through a global coordinator (higher latency).</p></li><li><p><strong>Atomicity:</strong> Use Redis Lua scripts (atomic) to avoid race conditions under high concurrency.</p></li><li><p><strong>Hot keys:</strong> Detect and apply special handling (rate limit more aggressively, throttle, route to dedicated nodes).</p></li></ul><h4>Handling System Failures</h4><p>How should you deal with failures in your system? </p><p>You also need to share this with your interviewer and choose a strategy depending on the business.</p><ul><li><p><strong>Redis/Store unavailable:</strong>  If Redis or your store is not available. What happens? Below are some of the strategies you can discuss with your interviewer.</p><ul><li><p><strong>Fail-closed (deny all):</strong> This strategy is safe but causes an outage.</p></li><li><p><strong>Fail-open (allow all):</strong> This strategy keeps availability but risks abuse.</p></li><li><p><strong>Pragmatic option (</strong><em><strong>fail-graceful)</strong></em><strong>:</strong> This strategy uses a local permissive leaky bucket with reduced accuracy and produces alerts.</p></li></ul></li><li><p><strong>Clock drift &amp; time accuracy:</strong> Try to avoid clock drifts and be accurate with time across the server. Use server timestamps and avoid client clocks.</p></li><li><p><strong>Network partitions:</strong> Enforce locally and reconcile later for non-billing use cases.</p></li><li><p><strong>Cache stampede:</strong> In case many clients miss and hit the DB/central store simultaneously, use request coalescing, locks, or &#8220;allow one repopulator&#8221;.</p></li><li><p><strong>Thundering herd on config changes/TTL expiry:</strong> Add jitter/randomized TTLs and stagger refresh.</p></li></ul><h4>API &amp; UX for clients</h4><p>Always return clear headers and status codes:</p><ul><li><p>Successful responses:</p><ul><li><p><code>X-RateLimit-Limit: &lt;limit&gt;</code></p></li><li><p><code>X-RateLimit-Remaining: &lt;remaining&gt;</code></p></li><li><p><code>X-RateLimit-Reset: &lt;unix-timestamp&gt;</code></p></li></ul></li><li><p>When throttled:</p><ul><li><p>HTTP <code>429 Too Many Requests</code></p></li><li><p><code>Retry-After: &lt;seconds&gt;</code></p></li><li><p>Response body with a friendly message and suggested backoff</p></li></ul></li></ul><p>These help clients implement exponential backoff and graceful retry.</p><h4>Metrics &amp; monitoring</h4><p>Track these metrics &amp; set alerts early. You can also discuss this with your interviewer to see what works for the system you&#8217;re designing.</p><ul><li><p>Requests allowed / requests denied (429)</p></li><li><p>Rate check latency (p95, p99)</p></li><li><p>Redis errors/timeouts/connection pool usage</p></li><li><p>Token refill rate/capacity usage</p></li><li><p>Top offenders (per API key / IP)</p></li><li><p>Success/failure ratios of local vs remote checks</p></li></ul><p>Use Prometheus and Grafana, Datadog, or similar. Make sure to alert on sudden spikes in 429s, Redis errors, and increased check latency.</p><h4>Dynamic config &amp; policy management</h4><p>Store throttling policies (per plan, per API-key) in a config store (DynamoDB, PostgreSQL, Consul).</p><p>Push policies to gateways via:</p><ul><li><p>Hot reload API</p></li><li><p>Polling + ETag</p></li><li><p>Push via SSE / WebSocket for instant changes</p></li></ul><pre><code>// A simple policy

{
  &#8220;id&#8221;: &#8220;plan-pro&#8221;,
  &#8220;type&#8221;: &#8220;token-bucket&#8221;,
  &#8220;rate&#8221;: 100,           // tokens per second
  &#8220;burst&#8221;: 200,          // max burst tokens
  &#8220;scope&#8221;: [&#8221;global&#8221;,&#8221;per-api-key&#8221;]
}</code></pre><h4>Implementation Example</h4><p>Here&#8217;s a simple rate-limiting implementation using the Token Bucket algorithm with Redis + Lua (TypeScript).</p><ol><li><p>Let&#8217;s define our Lua script for atomic token bucket implementation:</p></li></ol><pre><code>// Lua script: atomic token bucket

const lua = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])   -- tokens per ms
local capacity = tonumber(ARGV[3])
local tokens_needed = tonumber(ARGV[4])

local data = redis.call(&#8221;HMGET&#8221;, key, &#8220;tokens&#8221;, &#8220;last&#8221;)
local tokens = tonumber(data[1]) or capacity
local last = tonumber(data[2]) or now

-- refill
local elapsed = math.max(0, now - last)
local add = elapsed * refill_rate
tokens = math.min(capacity, tokens + add)
last = now

local allowed = 0
if tokens &gt;= tokens_needed then
  tokens = tokens - tokens_needed
  allowed = 1
end

redis.call(&#8221;HMSET&#8221;, key, &#8220;tokens&#8221;, tokens, &#8220;last&#8221;, last)
redis.call(&#8221;PEXPIRE&#8221;, key, 3600000) -- 1 hour TTL to reduce key churn
return { allowed, tokens }
`;</code></pre><ol start="2"><li><p>TypeScript + ioredis example (token bucket via Redis Lua):</p></li></ol><pre><code>import Redis from &#8216;ioredis&#8217;;
const redis = new Redis({ host: &#8216;redis-host&#8217;, port: 6379 });

const sha = await redis.script(&#8217;LOAD&#8217;, lua);

async function tryConsume(key: string, tokens = 1, ratePerSec = 100, capacity = 200) {
  const now = Date.now()
  const refillRatePerMs = ratePerSec / 1000.0
  const res = await redis.evalsha(sha, 1, key, now, refillRatePerMs, capacity, tokens)
  return { allowed: res[0] == 1, tokensLeft: tonumber(res[2]) }
}</code></pre><blockquote><p><strong>Notes:</strong> <code>ratePerSec</code> and <code>capacity</code> map to plan tiers. Use namespaced keys: <code>rl:api-key:&lt;id&gt;:&lt;route&gt;</code>. Keep the Lua script loaded (sha) for performance.</p></blockquote><h4>Fault Tolerance and reliability</h4><p>Always have DB/Service fallback and circuit breakers around rate store calls</p><pre><code>try {
  const allowed = await checkRate(...)
  if (!allowed) return 429
} catch (err) {
  logger.warn(&#8217;Rate store fail, using permissive local bucket&#8217;)
  // apply local soft quota
}</code></pre><p>Use replicas for Redis reads; perform writes/updates on master/shards. Implement health checks and automatic failover for the store. Use exponential backoff and retry for store ops, but avoid blocking the request path too long.</p><h2>Final Answer</h2><p>Here&#8217;s exactly how to put your answer forward to the interviewer:</p><div class="pullquote"><p>&#8220;I&#8217;d design the rate-limiter using a token-bucket algorithm enforced at the API gateway for low latency, backed by a clustered Redis store with Lua scripts for atomicity. Use per-API-key and per-route policies stored in a config service with hot reload. Add randomized TTLs and request coalescing to avoid stampedes, local token buckets for latency optimization, and monitoring for hit/deny ratios. For multi-region, deploy regional clusters with eventual consistency for non-billing quotas and a central coordinator for strict global quotas. This balances performance, burst handling, and operational reliability.&#8221;</p></div><p>Designing a rate-limiting system might <em>sound</em> like a small piece of the puzzle. However, as you dig in, you realize it&#8217;s really about understanding real-world traffic, distributed systems, and how to keep your APIs healthy under pressure.</p><p>Every little choice you make, the algorithm, the data store, how you handle bursts, how you sync counters, all of it affects how stable and reliable your system will be when things get busy.</p><p>So the next time an interviewer asks, <strong>&#8220;How would you design a rate-limiter at scale?&#8221;</strong> don&#8217;t just say, &#8220;I&#8217;ll use Redis.&#8221;</p><p>Walk them through your thinking. </p><p>Show them how you&#8217;d keep things fair, fast, and scalable even when millions of requests hit your system. That&#8217;s the part that shows you truly understand how real-world backend systems behave.</p><div><hr></div><p>I hope you learned something today: Spread the love. Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this new series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:<br><br><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 4 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 1000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com">solomoneseme.com</a>)</p>]]></content:encoded></item><item><title><![CDATA[How would you design a Distributed Cache for a High-Traffic System?]]></title><description><![CDATA[How to design a distributed cache that serves millions of requests per second &#8212; and explain it like a pro in your next backend interview.]]></description><link>https://kaperskyguru.substack.com/p/how-would-you-design-a-distributed-f91</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-would-you-design-a-distributed-f91</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 01 Nov 2025 15:30:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1Kzd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;&#8221;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com/?ref=backend-weekly&amp;utm_source=newsletter.masteringbackend.com&amp;utm_medium=newsletter&amp;utm_campaign=api-and-api-design-building-enterprise-apis&amp;_bhlid=c51e58543792da03dbe2c0ad9109a794492143ad">Masteringbackend</a><strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach</strong></em>.</p><div><hr></div><p>If this newsletter was shared with you, consider subscribing here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://kaperskyguru.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Here&#8217;s another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><p><strong>Before we dive in:</strong></p><p>If you are still stuck in tutorial hell</p><p>We&#8217;ve all been there&#8212;jumping from one Python video to the next, but never building anything real. No portfolio. No confidence. No interviews.</p><p>That&#8217;s why I created this.</p><p><strong>The &#8220;Land Your Dream Python Job&#8221; Challenge</strong><br>A 90-day, 3-phase roadmap that helps you:</p><p>&#9989; Build 30 real-world backend projects in 30 days<br>&#9989; Master DSA for technical interviews<br>&#9989; Get job-ready with resumes, mock interviews &amp; daily job alerts<br>&#9989; And finally... land that backend job</p><p>This is NOT another course. It&#8217;s a challenge. And it works. It&#8217;s beginner-friendly.</p><p>Over <strong>2,000 Python developers</strong> have taken this path&#8212;many are now working at top companies.</p><p><strong>Only 120 slots left at $54 (then goes up to $100)</strong></p><p>Join the challenge &amp; change your future<br>&#128073; <a href="https://python30.masteringbackend.com/?utm_source=newsletter.masteringbackend.com&amp;utm_medium=referral&amp;utm_campaign=api-and-api-design-api-security-part-2">python30.masteringbackend.com</a></p><p>Let&#8217;s get you unstuck.</p><div><hr></div><p><strong>This is the MB Interview Series on Backend Weekly every Saturday.</strong></p><p><em>In this series, I will walk you through how to answer common backend engineering interview questions, covering topics such as system design, microservices, API design, and databases.</em></p><p>Let&#8217;s get started with episode 2 (<a href="https://kaperskyguru.substack.com/p/how-would-you-design-a-distributed?r=6bmjcc">Episode 1 Here</a>):</p><div><hr></div><h2>The Interview Scenario</h2><p>You&#8217;re in a backend interview. </p><p>They ask: </p><p><em><strong>&#8220;How would you design a distributed cache for a high-traffic system?&#8221;</strong></em> </p><p>Here&#8217;s how to approach it:</p><div><hr></div><p>Before we dive in, we are building the next Interview Prep Playground targeting backend engineers. </p><p>Join our MB Interview waitlist: <a href="https://tally.so/r/w46glb">https://tally.so/r/w46glb</a></p><div><hr></div><p>Now, let&#8217;s start by clarifying <strong>why</strong> caching exists in the first place.</p><h3>Understand the problem</h3><p>Solving any problem is very simple, depending on your understanding of the problem. </p><p>If you want to boost the performance of your backend systems. Then you need Caching.</p><p><strong>Caching is the process of storing frequently accessed data temporarily in high-speed storage. This storage is called a cache.</strong></p><p>Caching helps speed up the retrieval of frequent data since the data is not accessed from the database directly.</p><p>While the goal for this is to:</p><ul><li><p>Reduce DB load.</p></li><li><p>Improve latency for frequently accessed data.</p></li><li><p>Handle millions of requests with low overhead.</p></li></ul><p>How do you design a distributed cache for a high-traffic system where we have multiple services running currently in a microservice architecture?</p><h3>The High-Level Architecture</h3><p>Let&#8217;s look at the high-level architecture for this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JmOL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JmOL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png 424w, https://substackcdn.com/image/fetch/$s_!JmOL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png 848w, https://substackcdn.com/image/fetch/$s_!JmOL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png 1272w, https://substackcdn.com/image/fetch/$s_!JmOL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JmOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png" width="655" height="292" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:292,&quot;width&quot;:655,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26254,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/177724439?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JmOL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png 424w, https://substackcdn.com/image/fetch/$s_!JmOL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png 848w, https://substackcdn.com/image/fetch/$s_!JmOL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png 1272w, https://substackcdn.com/image/fetch/$s_!JmOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ecf30-eea8-46fb-a3ea-287218dce5ea_655x292.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the core of our architecture:</p><ul><li><p><strong>Generate App Server:</strong> This is backend systems like Express server or Go server. All requests are handled here. These servers can be distributed into different regions and can be built with any server-side language.</p></li><li><p><strong>Cache Layer: </strong>The cache layer is any of the variants we discussed below. It can either be a Local or a Centralized Cache.</p></li><li><p><strong>Redis Cluster:</strong> This assumes we pick Redis as our cache service; we can spin up as many caching clusters to address our high-speed and performance needs.</p></li><li><p><strong>Database:</strong> The centralized database for our application.</p></li></ul><p>Here are the 3 variants you can choose to build with:</p><ol><li><p><strong>Local Cache (In-memory): </strong>Fastest but inconsistent across nodes. Example: using <code>node-cache</code> or <code>lru-cache</code>.</p></li><li><p><strong>Centralized Cache (e.g., Redis, Memcached): </strong>Shared across instances, consistent view. Needs replication and scaling.</p></li><li><p><strong>Hybrid Cache: </strong>Local cache for ultra-fast reads. Centralized cache for synchronization.</p></li></ol><h3>Caching Strategies</h3><p>Next, let&#8217;s explore some of the caching strategies. Here are three classic strategies:</p><ol><li><p><strong>Cache-Aside (Lazy Loading): </strong>The strategy allows the<strong> </strong>app to check the cache first, the if it misses, it fetches data from the database and writes it to cache for subsequent reads. It's a simpler method that avoids stale writes with a lower cold start latency.</p></li><li><p><strong>Write-Through: </strong>This strategy<strong> </strong>writes to the cache and the DB together. In this strategy, data is always fresh, and the write latency increases.</p></li><li><p><strong>Write-Behind: This strategy </strong>writes to the cache first and then writes to the DB asynchronously. Some of the features are Faster writes and a risk of data loss if the cache fails.</p></li></ol><p>You can choose the Cache-Aside strategy because it&#8217;s simple, scalable, and mostly common in production-ready backend systems.</p><h3>Eviction and Consistency Policy</h3><p>While the speed and performance benefits are endless, at some point, you must evict some data when there are changes. That&#8217;s where deciding the best eviction strategy comes in:</p><p>Here are some of the common eviction policies:</p><ul><li><p><strong>LRU (Least Recently Used)</strong> &#8212; evict the least-used data.</p></li><li><p><strong>LFU (Least Frequently Used)</strong> &#8212; evict infrequently used keys.</p></li><li><p><strong>TTL (Time-to-Live)</strong> &#8212; automatically removes expired data.</p></li></ul><p>For <strong>consistency models</strong>, you need to think about when your database changes. For instance, if your database data changes, your cache might temporarily serve old data.</p><p>You can use these strategies to solve that problem:</p><ul><li><p>Use cache invalidation on writes.</p></li><li><p>Add short TTLs.</p></li><li><p>Use event-driven updates (e.g., publish DB changes to cache via Kafka).</p></li></ul><p>While building a distributed cache, you might run into some challenges, and it&#8217;s good to share them with your interviewer.</p><h3>Some Distributed Challenges:</h3><p>Let&#8217;s explore some of the challenges and how to solve them:</p><h4>1. Cache Stampede</h4><p>This challenge happens when many requests hit a missing key simultaneously and all the request goes to the database at once.</p><p><strong>Fixes:</strong></p><ul><li><p>Use <strong>request coalescing</strong>. Design your system in a way that only one request repopulates the cache.</p></li><li><p>Use the <strong>&#8220;lock &amp; populate&#8221;</strong> mechanism with Redis locks.</p></li></ul><h4>2. Thundering Herd Problem</h4><p>Next, let&#8217;s look at the thundering herd problems. Here, many keys expire at once, which leads to a sudden DB load spike.</p><p>To fix this, add <strong>randomized TTLs</strong> or <strong>soft TTLs </strong>to prevent all keys from expiring simultaneously.</p><pre><code>await redis.set(cacheKey, data, { EX: 3600 + Math.random() * 300 }); </code></pre><p>Next, let&#8217;s elucidate how to scale a distributed cache.</p><h3>Scaling the Distributed Cache</h3><p>Scaling a distributed cache is hard:</p><p>Let&#8217;s make it simple to help you in your interviews.</p><p>When your cache grows large, single-node Redis won&#8217;t cut it.</p><p>Here&#8217;s how to scale it properly:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Kzd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Kzd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png 424w, https://substackcdn.com/image/fetch/$s_!1Kzd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png 848w, https://substackcdn.com/image/fetch/$s_!1Kzd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png 1272w, https://substackcdn.com/image/fetch/$s_!1Kzd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Kzd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png" width="1381" height="741" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:741,&quot;width&quot;:1381,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73366,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/177724439?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Kzd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png 424w, https://substackcdn.com/image/fetch/$s_!1Kzd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png 848w, https://substackcdn.com/image/fetch/$s_!1Kzd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png 1272w, https://substackcdn.com/image/fetch/$s_!1Kzd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e301e4f-2628-472e-a28c-11bc9ea87554_1381x741.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here are some of the techniques used to scale a distributed cache:</p><ul><li><p><strong>Sharding:</strong> Split keys across multiple servers.</p></li><li><p><strong>Consistent Hashing:</strong> Avoid massive reshuffles when nodes are added/removed.</p></li><li><p><strong>Replication:</strong> Use replicas for fault tolerance.</p></li><li><p><strong>Multi-Region Caches:</strong> Deploy regional cache clusters close to users.</p></li></ul><p>While scaling your distributed cache systems, here are some additions that will come in handy.</p><h4>Fault Tolerance and Reliability</h4><p>We need to consider what happens if our cache server fails. As with other production-ready systems, it should fail gracefully without interrupting the flow of our backend system.</p><p>Here are some best practices to build a fault-tolerant and reliable system:</p><ul><li><p>Always have a DB fallback.</p></li><li><p>Set timeouts and circuit breakers around cache calls.</p></li><li><p>Monitor cache hit ratios and latency.</p></li></ul><pre><code>try {
  const result = await redis.get(key);
  if (result) return result;
} catch (error) {
  logger.warn(&#8221;Cache unavailable, falling back to DB&#8221;);
  return await db.query(key);
}</code></pre><p>Here&#8217;s a simple example that queries your database if something goes wrong with our cache server. </p><p>Additionally, we need to observe and monitor our cache servers like our main servers.</p><h4>Observability and Metrics</h4><p>Building a distributed system is not  production-ready  without observability. Here are some metrics you can track as you build your distributed cache system.</p><ul><li><p>Cache hit ratio</p></li><li><p>Eviction count</p></li><li><p>Latency (p95, p99)</p></li><li><p>Connection pool usage</p></li><li><p>Replication lag</p></li></ul><p>You can use tools such as <strong>Prometheus + Grafana</strong> or <strong>Datadog</strong>. This gives visibility into when and how your cache starts misbehaving.</p><p>Here are a few ideas to show advanced thinking:</p><ul><li><p><strong>Hot Key Detection:</strong> Track frequently requested keys and prefetch them.</p></li><li><p><strong>Near Cache Pattern:</strong> Keep a local in-memory cache synced with the distributed cache.</p></li><li><p><strong>Compression:</strong> Compress large cached values to save memory.</p></li><li><p><strong>Lazy Expiration:</strong> Extend TTL if the key is actively accessed.</p></li></ul><h2>Final Answer</h2><p>Here&#8217;s exactly how to put your answer forward to the interviewer:</p><div class="pullquote"><p>&#8220;I&#8217;d design a distributed caching system using Redis with cache-aside pattern, LRU eviction, sharding via consistent hashing, and replication for fault tolerance. I&#8217;d add randomized TTLs and request coalescing to avoid cache stampedes, and expose metrics for observability. This design balances speed, consistency, and scalability for high-traffic systems.&#8221;</p></div><p>Designing a distributed caching system might sound simple &#8212; but it&#8217;s a deep dive into <strong>scalability, data consistency, distributed systems, and real-world resilience.</strong></p><p>Every decision &#8212; from TTLs to hashing &#8212; impacts performance and reliability.</p><p>So next time an interviewer asks, <em>&#8220;How would you design a distributed cache?&#8221;</em><br>Don&#8217;t just say &#8220;I&#8217;ll use Redis.&#8221;</p><p>Walk them through <strong>how you&#8217;d make it reliable, scalable, and production-grade.</strong></p><p></p><div><hr></div><p>I hope you learned something today: Spread the love. Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this new series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:<br><br><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 4 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 1000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://masteringbackend.com/courses">MB Video-Based Courses:</a></strong> Join 1000+ backend engineers who learn from our meticulously crafted courses designed to empower you with the knowledge and skills you need to excel in backend development.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon (<a href="http://solomoneseme.com">solomoneseme.com</a>)</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How would you design a distributed job scheduling system like Cron at scale]]></title><description><![CDATA[What would you do if you had to design Cron &#8212; but for millions of jobs, across multiple servers, at global scale?]]></description><link>https://kaperskyguru.substack.com/p/how-would-you-design-a-distributed</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/how-would-you-design-a-distributed</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Sat, 25 Oct 2025 15:00:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6fnL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com/?ref=backend-weekly&amp;utm_source=newsletter.masteringbackend.com&amp;utm_medium=newsletter&amp;utm_campaign=api-and-api-design-building-enterprise-apis&amp;_bhlid=c51e58543792da03dbe2c0ad9109a794492143ad">Masteringbackend</a><strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach</strong></em>.</p><div><hr></div><p>Welcome to another issue of <strong>Backend Weekly</strong> &#8212; your favorite newsletter on mastering backend engineering through real-world systems and interview design questions.</p><p><strong>Before we dive in:</strong><br><br>We recently moved <strong>Backend Weekly</strong> from <strong>Beehiiv</strong> to <strong>Substack!</strong><br>You don&#8217;t have to do anything &#8212; same great weekly content, just a new home.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div><hr></div><p>Welcome to the interview series on Backend Weekly.</p><p><em>In this series, I will walk you through how to answer common backend engineering interview questions ranging from different backend engineering concepts like system design, microservices, database, etc.</em></p><p>Let&#8217;s get started with episode 1:</p><div><hr></div><h2>The Interview Scenario</h2><p>You&#8217;re in a backend interview. </p><p>They ask: </p><p><em><strong>&#8220;How would you design a distributed job scheduling system like Cron but at scale?&#8221;</strong></em><strong> </strong></p><p>Here&#8217;s how to approach it:</p><h2>Understand the Problem</h2><p>The first step towards solving any problem is understanding the problem inside and out.</p><p>Let&#8217;s break down the problem:</p><p>A cron service works great when you&#8217;re on a single machine. However, what if you&#8217;re running millions of jobs, recurring or one-time, across hundreds of servers, regions, and tenants.</p><p>Below are some of the problems you will have to worry about:</p><ul><li><p>Jobs running exactly once at the correct time.</p></li><li><p>No duplication or missed executions</p></li><li><p>Scalability and handling spikes in scheduled jobs.</p></li><li><p>Fault tolerance: What happens when a node dies mid execution.</p></li></ul><p>When you take these problems (and more) into consideration, you&#8217;re no longer designing a single simple Cron service.</p><h2>The High-Level Architecture</h2><p>Now that we understand the challenges of building a distributed Cron system, let&#8217;s visualize the architecture.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6fnL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6fnL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png 424w, https://substackcdn.com/image/fetch/$s_!6fnL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png 848w, https://substackcdn.com/image/fetch/$s_!6fnL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png 1272w, https://substackcdn.com/image/fetch/$s_!6fnL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6fnL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png" width="781" height="382" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53457279-7c75-4889-8b39-07d4999fdff9_781x382.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:382,&quot;width&quot;:781,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33846,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/177075790?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6fnL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png 424w, https://substackcdn.com/image/fetch/$s_!6fnL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png 848w, https://substackcdn.com/image/fetch/$s_!6fnL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png 1272w, https://substackcdn.com/image/fetch/$s_!6fnL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53457279-7c75-4889-8b39-07d4999fdff9_781x382.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here&#8217;s how each part fits and what it does:</p><ul><li><p><strong>Client/API Gateway:</strong> This is where the jobs are created, updated, and deleted via API or UI.</p></li><li><p><strong>Job Service:</strong> The job service stores the job definitions and metadata such as interval, next run time, owner, payload, etc.</p></li><li><p><strong>Scheduler Service:</strong> The scheduler service continuously checks which jobs are due and dispatches them for execution.</p></li><li><p><strong>Distributed Queue (Kafka, RabbitMQ, SQS):</strong> This buffers and distributes jobs to available workers.</p></li><li><p><strong>Worker Nodes:</strong> The worker node executes the actual jobs either by running a script, making HTTP calls, or performing database operations and report results.</p></li><li><p><strong>Database:</strong> The database is used to persist job definitions, execution history, and audit logs.</p></li></ul><h2>Scheduling Logic</h2><p>Now let&#8217;s talk about how jobs are actually triggered.</p><p>All jobs are stored in the database with metadata like:</p><pre><code>job_id, interval, next_run_at, owner, status</code></pre><p>Storing the metadata for each job is the function of the Job Service we mentioned before.</p><p>The scheduler periodically polls the database for jobs due to run and does the following:</p><ul><li><p>It pushed the task to the distributed queue.</p></li><li><p>Workers consume from the queue and execute.</p></li><li><p>Results and timestamps are written back to the database.</p></li></ul><p>Let&#8217;s visualize this process:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fJqN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fJqN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png 424w, https://substackcdn.com/image/fetch/$s_!fJqN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png 848w, https://substackcdn.com/image/fetch/$s_!fJqN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png 1272w, https://substackcdn.com/image/fetch/$s_!fJqN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fJqN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png" width="751" height="171" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e23878bd-0489-4724-9d75-efd54b6373e1_751x171.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:171,&quot;width&quot;:751,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12052,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/177075790?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fJqN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png 424w, https://substackcdn.com/image/fetch/$s_!fJqN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png 848w, https://substackcdn.com/image/fetch/$s_!fJqN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png 1272w, https://substackcdn.com/image/fetch/$s_!fJqN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe23878bd-0489-4724-9d75-efd54b6373e1_751x171.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The scheduling logic looks simple right? But things can get tricky when it becomes distributed.</p><p>Now that the scheduling logic is out. Let&#8217;s look at some challenges we might face when dealing with millions of users.</p><h2>The Challenges</h2><p>Let&#8217;s explore some of the challenges and how to solve them:</p><h3>1. Avoiding Duplicate Runs</h3><p>Let&#8217;s consider what happens when you have <strong>multiple scheduler nodes</strong> running.</p><p>Each node might poll the same database table and try to execute the same job simultaneously. That will cause duplicated jobs to execute which is a big NO for a scheduling service.</p><p>Here&#8217;s a quick solution to solve duplicates:</p><ul><li><p><strong>Distributed Locks (Redis lock):</strong> Before executing, each scheduler node tries to acquire a Redis lock for the job, and only the node that acquired the lock proceeds.</p></li><li><p><strong>Database Row Locks:</strong> Use transactional row-level locking when updating <code>`next_run_at`</code> or <code>`status`</code>.</p></li><li><p><strong>Consistent Hashing: </strong>Assign each job to a specific scheduler node based on a hash of its ID to avoid overlap entirely.</p></li><li><p><strong>Timestamps: </strong>Track `last_executed_at` to prevent re-runs.</p></li></ul><p>Avoiding duplicates in a scheduling service is one of the core challenges to tackle. Therefore, now that we have solved that. Let&#8217;s see how we can scale the system to handle millions.</p><h3>2. Scaling the System</h3><p>When you&#8217;re handling <strong>millions of jobs</strong>, even a single scheduler won&#8217;t cut it.</p><p>Therefore, you can explore the following scalability options:</p><ul><li><p><strong>Partitioning jobs</strong>: You can partition jobs by user, tenant, or time window.</p></li><li><p><strong>Sharding queues</strong>: Where each scheduler writes to its own queue partition.</p></li><li><p><strong>Leader election</strong>: You can use Zookeeper or etcd to coordinate active schedulers.</p></li><li><p><strong>Auto-scaling workers</strong>: You can scale each worker independently.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0yAC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0yAC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png 424w, https://substackcdn.com/image/fetch/$s_!0yAC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png 848w, https://substackcdn.com/image/fetch/$s_!0yAC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png 1272w, https://substackcdn.com/image/fetch/$s_!0yAC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0yAC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png" width="1062" height="151" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:151,&quot;width&quot;:1062,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17684,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/177075790?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0yAC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png 424w, https://substackcdn.com/image/fetch/$s_!0yAC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png 848w, https://substackcdn.com/image/fetch/$s_!0yAC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png 1272w, https://substackcdn.com/image/fetch/$s_!0yAC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1781f537-7220-4462-9f2e-daf823f716d6_1062x151.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Now your system can elastically scale based on load. However, the system can still break, and that&#8217;s inevitable. So, how do we tackle this so the scheduling service does not crash?</p><h3>3. Fault Tolerance</h3><p>It is general knowledge that sometimes, distributed system fails, and that&#8217;s inevitable.</p><p>Therefore, the interviewer will expect that your system is designed for fault tolerance and to gracefully recover from failures.</p><p>Let&#8217;s look at how to achieve fault tolerance in our system:</p><ul><li><p><strong>Worker Node Crashes:</strong> When designing your worker node, make sure it uses <strong>ack/nack</strong> so that if a worker dies mid-task, the job is re-queued for another worker.</p></li><li><p><strong>Scheduler Fails or Restarts:</strong> Design your schedulers to be stateless, so you can restart each scheduler freely.</p></li><li><p><strong>Persistence: </strong>Every job&#8217;s lifecycle (created, scheduled, executed, failed, retried) is stored in the DB for audit and replay.</p></li></ul><p>Designing your system to gracefully recover from failures is the best thing to do when building for millions. However, the next best thing is to observe when something is wrong and what caused it.</p><h3>4. Observability &amp; Management</h3><p>While designing this system and answering your interviewer&#8217;s question, you need to understand that your scheduler service is not only about execution but also visibility.</p><p>You will need to have monitoring and observability in place to constantly learning about the service and where to improve or fix bugs.</p><p>You will need to set the following up:</p><ul><li><p><strong>APIs and dashboards</strong> to check job status, next runs, and logs.</p></li><li><p><strong>Metrics:</strong></p><ul><li><p>Job latency</p></li><li><p>Success/failure rates</p></li><li><p>Queue depth</p></li><li><p>Worker throughput</p></li></ul></li><li><p><strong>Retries with exponential backoff</strong> for transient failures.</p></li></ul><p>This helps SREs and developers <strong>trust the system</strong>. Solving all these challenges will put your scheduling service in a position to handle millions of users. However, we can still optimize further.</p><h2>Optimization Tricks</h2><p>Now that the core of the system is out, you can start making it smarter.</p><p>Let&#8217;s explore some optimization tricks that you can add or discuss with your interviewer to give you an upper hand in the interview.</p><ul><li><p><strong>Priority Queues: </strong>Urgent jobs get scheduled first.</p></li><li><p><strong>Redis Cache: </strong>Cache active or hot jobs to reduce DB polling.</p></li><li><p><strong>Batching: </strong>Combine small periodic jobs into one batch job to save resources.</p></li><li><p><strong>Adaptive Polling: </strong>Dynamically adjust the scheduler&#8217;s polling interval based on system load.</p></li></ul><h2>Final Answer</h2><p>Here&#8217;s exactly how to put your answer forward to the interviewer:</p><blockquote><p>&#8220;I will design a distributed, fault-tolerant job scheduling system with a central job database, horizontally scaled schedulers using Redis locks for deduplication, a distributed queue for task dispatching, and stateless workers for execution &#8212; observable, elastic, and cloud-ready.&#8221;</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EDBH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EDBH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png 424w, https://substackcdn.com/image/fetch/$s_!EDBH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png 848w, https://substackcdn.com/image/fetch/$s_!EDBH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png 1272w, https://substackcdn.com/image/fetch/$s_!EDBH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EDBH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png" width="1231" height="231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:231,&quot;width&quot;:1231,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26155,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/177075790?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EDBH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png 424w, https://substackcdn.com/image/fetch/$s_!EDBH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png 848w, https://substackcdn.com/image/fetch/$s_!EDBH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png 1272w, https://substackcdn.com/image/fetch/$s_!EDBH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b6b759-8d80-48f3-bfae-d769581e4cc7_1231x231.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>Closing Thoughts</h2><p>Designing something as simple as &#8220;cron at scale&#8221; teaches you nearly every backend concept that matters:</p><ul><li><p>Distributed locking</p></li><li><p>Leader election</p></li><li><p>Message queues</p></li><li><p>Fault tolerance</p></li><li><p>Observability</p></li><li><p>Scalability</p></li></ul><p>It&#8217;s a masterclass in <strong>system design thinking</strong>.</p><p>So next time an interviewer asks you this question in an interview &#8212; don&#8217;t just say &#8220;I&#8217;ll use a queue.&#8221;</p><p>Walk them through <strong>why</strong> and <strong>how</strong> you&#8217;d make it reliable, scalable, and production-grade.</p><div><hr></div><p>I hope you learned something today: Spread the love. Share this newsletter with at least two of your friends today.</p><p>Also, let me know if you enjoy this new series and if you want me to continue breaking down interview questions like this.</p><div><hr></div><p>Remember to start learning backend engineering from our courses:<br><br><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 4 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com/">The MB Platform:</a> </strong>Join 1000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://masteringbackend.com/courses">MB Video-Based Courses:</a></strong> Join 1000+ backend engineers who learn from our meticulously crafted courses designed to empower you with the knowledge and skills you need to excel in backend development.</p><p><strong>4. <a href="https://getbackendjobs.com/?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075;</p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon</p>]]></content:encoded></item><item><title><![CDATA[API and API Design: API Performance]]></title><description><![CDATA[API Performance is crucial when talking about APIs and API design. It refers to the efficiency and speed at which a developed API can execute tasks and return responses.]]></description><link>https://kaperskyguru.substack.com/p/api-and-api-design-api-performance</link><guid isPermaLink="false">https://kaperskyguru.substack.com/p/api-and-api-design-api-performance</guid><dc:creator><![CDATA[Solomon Eseme]]></dc:creator><pubDate>Thu, 16 Oct 2025 06:19:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!N1XR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello &#8220;&#128075;</p><p><em>Welcome to another week, another opportunity to become a Great Backend Engineer.</em></p><p><em>Today&#8217;s issue is brought to you by <a href="https://masteringbackend.com/?ref=backend-weekly&amp;utm_source=newsletter.masteringbackend.com&amp;utm_medium=newsletter&amp;utm_campaign=api-and-api-design-building-enterprise-apis&amp;_bhlid=c51e58543792da03dbe2c0ad9109a794492143ad">Masteringbackend</a><strong> &#8594; An all-in-one platform that helps backend engineers become highly-paid backend and AI engineers by leveraging a practical-based learning approach</strong></em>.</p><div><hr></div><h3>Public Announcement:</h3><p>Welcome back, Great Backend Engineers. I recently moved my newsletter from Beehiiv to Substack here.</p><p>Therefore, this is a public notice that you&#8217;re now receiving this email from Substack. Everything else remains the same. Same great weekly content on backend engineering is now coming to the Substack platform.</p><div><hr></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://kaperskyguru.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Backend Weekly! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Previously, I completed the <a href="https://newsletter.masteringbackend.com/p/api-and-api-design-api-security-part-2">top 10 security risks from this OWASP top 10</a> list, and in this issue, I will explore the concept of API Performance in APIs and API Design.</p><p>API Performance is crucial when talking about APIs and API design. It refers to the efficiency and speed at which a developed API can execute tasks and return responses.</p><h2>What is API Performance?</h2><p>API Performance refers to the speed at which an API responds to a request and also the efficiency of the responses that are returned.</p><p>API performance directly impacts the responsiveness of an application, determining how quickly data can be exchanged, processed, and presented to the end-user. </p><p>When your API is slow, it directly affects the experience of the end user. </p><p>Therefore, improving the performance of your API resolves the problems related to the user experience and enhances the overall performance of the application that the API is integrated with.</p><p>Next, let&#8217;s explore 7 strategies to increase the performance of your backend APIs:</p><h3>Top 7 Strategies to 10x Your API Performance</h3><p>Below are the 7 efficient strategies to boost your API performance and increase the efficiency of your API.</p><ul><li><p>Caching</p></li><li><p>Pagination</p></li><li><p>Avoid N+1 Queries</p></li><li><p>Compression</p></li><li><p>Connection Pooling</p></li><li><p>Serialization</p></li><li><p>Use Asynchronous Logging</p></li></ul><h4>1. Use Caching</h4><p>Caching is one of the most powerful techniques to improve API performance. By storing frequently accessed data in memory, you reduce the need to repeatedly fetch data from databases or external services. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N1XR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N1XR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png 424w, https://substackcdn.com/image/fetch/$s_!N1XR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png 848w, https://substackcdn.com/image/fetch/$s_!N1XR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png 1272w, https://substackcdn.com/image/fetch/$s_!N1XR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N1XR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png" width="721" height="570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:721,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30864,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://kaperskyguru.substack.com/i/176182482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N1XR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png 424w, https://substackcdn.com/image/fetch/$s_!N1XR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png 848w, https://substackcdn.com/image/fetch/$s_!N1XR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png 1272w, https://substackcdn.com/image/fetch/$s_!N1XR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd2177ae-8a59-4f9b-935a-334ba63ae9e1_721x570.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A diagram showing caching strategies</figcaption></figure></div><p>Implementing caching strategies like Redis or Memcached allows your API to serve responses instantly without performing expensive database queries.</p><p>Consider implementing multiple caching layers: </p><ul><li><p>Client-side caching for static assets</p></li><li><p>Server-side caching for database queries</p></li><li><p>HTTP caching headers to leverage browser and CDN caches. </p></li></ul><p>Cache invalidation is equally important&#8212;ensure you have a strategy to refresh cached data when it becomes stale. </p><p>Most modern applications use a combination of time-based expiration (TTL) and event-based invalidation to maintain data consistency while maximizing performance gains.</p><h4><strong>2. Pagination</strong></h4><p>Large datasets can significantly slow down your API responses. </p><p>Pagination breaks down large result sets into smaller, manageable chunks that are easier to transmit and process. </p><p>By limiting the number of records returned per request, you reduce bandwidth consumption, decrease memory usage on both client and server, and dramatically improve response times.</p><p>Implement cursor-based pagination for better performance with large datasets, as it handles additions and deletions more gracefully than offset-based pagination. </p><p>Always set sensible default page sizes and allow clients to configure limits within acceptable boundaries. This approach also improves user experience by delivering data progressively rather than making users wait for massive responses.</p><h4><strong>3. Avoid N+1 Queries</strong></h4><p>The N+1 query problem occurs when your application executes one query to fetch parent records, then executes N additional queries to fetch related data for each parent record. </p><p>This can turn a simple operation into dozens or hundreds of unnecessary database roundtrips, severely degrading performance.</p><p>Use eager loading techniques like JOIN operations, database-level population, or GraphQL batching to fetch all required data in fewer queries. </p><p>ORMs like Sequelize, Hibernate, or TypeORM provide built-in mechanisms to prevent N+1 queries. </p><p>Regularly audit your API logs and database query patterns to identify and eliminate these bottlenecks before they impact production performance.</p><h4><strong>4. Compression</strong></h4><p>Compressing API responses using gzip or Brotli can reduce payload sizes by up to 70%, dramatically decreasing bandwidth consumption and network transmission times. </p><p>Modern HTTP servers and client libraries support compression transparently, making it one of the easiest wins for performance optimization.</p><p>Enable compression at the server level for all text-based responses including JSON, XML, and HTML. Set appropriate compression levels based on your infrastructure&#8212;higher compression ratios save bandwidth but consume more CPU resources.</p><p>Monitor the trade-off between compression overhead and transmission savings, as the benefits are most pronounced for larger payloads while small responses might not justify the computational cost.</p><h4><strong>5. Connection Pooling</strong></h4><p>Every database connection involves overhead&#8212;establishing connections, authentication, and resource allocation. Creating new connections for each request is wasteful and creates unnecessary latency. </p><p>Connection pooling maintains a pool of pre-established database connections that are reused across requests, significantly reducing connection overhead.</p><p>Configure appropriate pool sizes based on your application&#8217;s concurrency requirements and database capacity. Too small a pool creates bottlenecks, while too large a pool wastes resources. </p><p>Most modern backend frameworks include connection pooling out of the box, but fine-tuning parameters like minimum pool size, maximum pool size, and connection timeout ensures optimal performance for your specific workload.</p><h4><strong>6. Serialization</strong></h4><p>The process of converting application objects into transmissible formats (JSON, XML, Protocol Buffers) consumes CPU resources. Optimizing serialization reduces this overhead and speeds up response times. </p><p>Choose efficient serialization formats and consider using lightweight alternatives like Protocol Buffers or MessagePack for performance-critical APIs.</p><p>Implement selective field serialization to exclude unnecessary data from responses, reducing payload sizes and serialization time. </p><p>Use streaming serialization for large responses to avoid holding entire datasets in memory. Benchmark different serialization strategies in your environment to identify which approach balances performance, bandwidth, and compatibility for your use case.</p><h4><strong>7. Use Asynchronous Logging</strong></h4><p>Synchronous logging writes logs directly to disk before the application continues, creating blocking I/O operations that slow down request processing. </p><p>Asynchronous logging buffers log entries in memory and writes them to disk separately, allowing your application to continue handling requests without waiting for disk I/O.</p><p>Implement async logging using message queues or dedicated logging libraries designed for high-throughput scenarios. </p><p>This approach not only improves API response times but also prevents a single slow I/O operation from impacting your entire application. Ensure you have appropriate buffer sizes and overflow handling to prevent data loss during high-load periods.</p><div><hr></div><p><strong>Did you learn any new things from this newsletter this week? Please reply to this email and let me know. Feedback like this encourages me to keep going.</strong></p><p>Remember to start learning backend engineering from our courses:<br><br><strong>Get a 50% discount on any of these courses. Reach out to me (Reply to this mail)</strong></p><ol><li><p><a href="https://masteringbackend.com/courses/become-a-python-backend-engineer">Become a Python Backend Engineer is Live</a></p></li><li><p><a href="https://masteringbackend.com/courses/become-a-java-spring-backend-engineer">Become a Java + Spring Backend Engineer is Live</a></p></li></ol><div><hr></div><h2><strong>Backend Engineering Resources</strong></h2><ol><li><p><a href="https://masteringbackend.com/hubs/backend-engineering">Backend Engineering Hub</a></p></li><li><p><a href="https://masteringbackend.com/books">All Backend Books</a></p></li><li><p><a href="https://store.masteringbackend.com/">Visit our Backend Store</a></p></li><li><p><a href="https://masteringbackend.com/community">Join our Community</a></p></li><li><p><a href="https://masteringbackend.com/courses">Backend Engineering Courses</a></p></li></ol><div><hr></div><h3><strong>Whenever you&#8217;re ready</strong></h3><p><strong>There are 4 ways I can help you become a great backend engineer:</strong></p><p><strong>1.</strong> <strong><a href="https://app.masteringbackend.com">The MB Platform:</a> </strong>Join 1000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.</p><p><strong>2. <a href="https://click.convertkit-mail4.com/4zuplwzlo3aeh5lkndnhxh3d3ml77/6qhehou7nrnkwlbo/aHR0cHM6Ly93d3cuanVzdGlud2Vsc2gubWUvdGhlLW9wZXJhdGluZy1zeXN0ZW0tZ3Jvdy1tb25ldGl6ZS15b3VyLWxpbmtlZGlu">&#8203;</a><a href="https://masteringbackend.com/academy">The MB Academy:&#8203; </a></strong>The &#8220;MB Academy&#8221; is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.</p><p><strong>3. <a href="https://masteringbackend.com/courses">MB Video-Based Courses:</a></strong> Join 1000+ backend engineers who learn from our meticulously crafted courses designed to empower you with the knowledge and skills you need to excel in backend development.</p><p><strong>4. <a href="https://getbackendjobs.com?ref=backend-weekly">GetBackendJobs:</a></strong> Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.</p><div><hr></div><p><strong>LAST WORD </strong>&#128075; </p><p><strong>How am I doing?</strong></p><p>I love hearing from readers, and I&#8217;m always looking for feedback. How am I doing with The Backend Weekly? Is there anything you&#8217;d like to see more or less of? Which aspects of the newsletter do you enjoy the most?</p><p>Hit reply and say hello - I&#8217;d love to hear from you!</p><p>Stay awesome,<br>Solomon</p>]]></content:encoded></item></channel></rss>