System Design: Scalability and Performance

System Design: Scalability and Performance
1. Vertical vs. Horizontal Scaling
- Vertical Scaling (Scaling UP): Buying a bigger server with more CPU and RAM.
- Pros: Zero code changes. Simple.
- Cons: There is a limit to how big a single server can be. It's expensive and creates a "Single Point of Failure."
- Horizontal Scaling (Scaling OUT): Adding 10 small servers instead of 1 big one.
- Pros: Theoretically infinite scale. High availability.
- Cons: Requires a Load Balancer (Module 182) and your code must be "Stateless."
2. Statelessness: The Rule of Cloud
If you scale horizontally, User A might talk to Server 1 now, and Server 2 one minute later.
- If Server 1 saved the "Cart" in its internal RAM, Server 2 won't know about it.
- The Rule: Never store session data in your server's memory. Store it in a Central Cache (Redis) or a Database.
- This makes your servers "Replaceable." You can shut down 5 servers at midnight to save money without losing any user data.
3. Reliability vs. Availability
- Availability: Is the service up? (99.9% means 8 hours of downtime a year).
- Reliability: Does the service do what it's supposed to do correctly? "An available system can still be unreliable if it gives the wrong bank balance half the time." In 2026, we aim for "Five Nines" (99.999% availability), which means only 5 minutes of downtime per YEAR.
4. Latency vs. Throughput
- Latency: How long it takes for ONE request. (e.g., 50ms).
- Throughput: How many requests per second total. (e.g., 5,000 req/sec). The Trade-off: Sometimes, to increase Throughput (handling more users), you have to accept slightly higher Latency (individual requests are slower). Mastering this balance is the mark of a Senior System Designer.
Frequently Asked Questions
Is scalability just for big companies? No. Scaling "Down" is just as important. In 2026, using Serverless (Module 174) allows a small startup to scale from 0 to 1,000 users in 1 second, and then back to 0 at night to save 100% of its budget.
What is the 'Single Point of Failure'? If your whole system relies on ONE database or ONE load balancer, it is not scalable. If that one part dies, the whole site dies. Scaling is about creating "Redundancy"—having a backup for everything.
Key Takeaway
Scalability is the "Engine of Growth." By mastering Horizontal scaling and the discipline of Statelessness, you gain the ability to grow your application from a tiny prototype into a global platform without ever hitting a "Capacity Wall." You graduate from "Managing a server" to "Orchestrating a Fleet."
Read next: Load Balancing Strategies: Managing the Traffic →
Part of the Software Architecture Hub — engineering the scale.
