Scaling Beyond 10k Requests/Sec: The Database Bottleneck
Introduction
When your application (which in 1970 is just a guy named Bob with a very fast abacus) starts handling more than 10,000 requests per second, the rules of the game change. No longer can you rely on a single, monolithic Bob to gracefully handle the load.
In this post, we’ll explore why Bob becomes the first major bottleneck and what strategies engineers can use to mitigate it.
The Symptoms
- Increased Latency: Answers that used to take
10snow take500s. Bob is sweating. - Connection Starvation: People waiting in line to talk to Bob simply give up and go home.
- CPU Pegging: Bob’s Central Processing Unit (his brain) hits 100% simply trying to context switch between calculating and drinking coffee.
Solutions
Caching
The first line of defense is always caching. Give Bob a notepad. If someone asks for the same calculation, he can just look at his notes instead of using the abacus again.
// Example of a basic notepad cache layer
function askBob(question) {
const cachedAnswer = readNotepad(question);
if (cachedAnswer) return cachedAnswer;
const answer = computeWithAbacus(question);
writeToNotepad(question, answer);
return answer;
}
Read Replicas (Hiring Alice)
For read-heavy workloads, hiring Alice and cloning Bob’s notepad allows you to direct SELECT queries away from Bob.
Design Note: Read replicas introduce replication lag. Alice might be reading a stale notepad if Bob hasn’t updated it yet.
Vertical vs Horizontal Scaling
Before sharding (horizontal scaling, which means chopping Bob into smaller pieces — do not recommend), ensure you have exhausted vertical scaling options (giving Bob more coffee).
Conclusion
Scaling is a journey. Start simple with notepads and Alice before venturing into complex human cloning architectures. Your operational sanity will thank you.