80% is 100%
A common mistake in capacity planning is assuming you can run a server or a database at 99% CPU utilization. Intuition suggests that if you have 100 requests per second capacity, and you are receiving 90 requests, you are fine. Queueing Theory (Little's Law) proves you are dead.
The Hockey Stick Graph
Response time does not increase linearly with utilization. It stays flat until about 70-80% utilization, and then it goes vertical (asymptotes to infinity). Why? Because variability exists. Requests don't arrive in perfectly spaced intervals. They arrive in bursts.
At high utilization, a tiny burst creates a queue. Because the CPU is busy, it can't clear the queue faster than new requests arrive. The queue grows infinitely.
Vertical vs Horizontal Scaling
- Vertical: Buy a bigger server. Easy, but has a hard limit.
- Horizontal: Add more servers behind a load balancer. Harder architecture (statelessness required), but infinite scale.
The Thundering Herd
A classic failure mode:
- Database slows down (high CPU).
- Web servers time out waiting for DB.
- Users refresh the page (retries).
- Web servers send more requests to the dying DB.
- Total Collapse.
To prevent this, you need Circuit Breakers and Exponential Backoff.
Use the Capacity Planner to visualize where your breaking point is. It's always sooner than you think.