Scaling Beyond 10k Requests/Sec: The Database Bottleneck

Jan 1, 1970

Introduction

When your application (which in 1970 is just a guy named Bob with a very fast abacus) starts handling more than 10,000 requests per second, the rules of the game change. No longer can you rely on a single, monolithic Bob to gracefully handle the load.

In this post, we’ll explore why Bob becomes the first major bottleneck and what strategies engineers can use to mitigate it.

The Symptoms

Increased Latency: Answers that used to take 10s now take 500s. Bob is sweating.
Connection Starvation: People waiting in line to talk to Bob simply give up and go home.
CPU Pegging: Bob’s Central Processing Unit (his brain) hits 100% simply trying to context switch between calculating and drinking coffee.

Solutions

Caching

The first line of defense is always caching. Give Bob a notepad. If someone asks for the same calculation, he can just look at his notes instead of using the abacus again.

// Example of a basic notepad cache layer
function askBob(question) {
  const cachedAnswer = readNotepad(question);
  if (cachedAnswer) return cachedAnswer;
  
  const answer = computeWithAbacus(question);
  writeToNotepad(question, answer);
  
  return answer;
}

Read Replicas (Hiring Alice)

For read-heavy workloads, hiring Alice and cloning Bob’s notepad allows you to direct SELECT queries away from Bob.

Design Note: Read replicas introduce replication lag. Alice might be reading a stale notepad if Bob hasn’t updated it yet.

Vertical vs Horizontal Scaling

Before sharding (horizontal scaling, which means chopping Bob into smaller pieces — do not recommend), ensure you have exhausted vertical scaling options (giving Bob more coffee).

Conclusion

Scaling is a journey. Start simple with notepads and Alice before venturing into complex human cloning architectures. Your operational sanity will thank you.

arrow_left All Posts