Scaling, Load Balancing & Statelessness

Vertical vs horizontal scaling, why stateless services are the unlock, load-balancer types and algorithms, and how to handle sessions when any server can take any request.

Two ways to scale

Vertical (scale up): a bigger box (more CPU/RAM). Simple, no app changes — but there's a hard ceiling and it's a single point of failure.
Horizontal (scale out): more boxes behind a load balancer. Near-unlimited and fault-tolerant, but requires your services to be stateless and your data layer to handle concurrency.

Real systems do both: scale up until it's uneconomical, then scale out.

If a server stores session state in memory, request #2 must hit the same server (sticky sessions) — which breaks load balancing and fails when that box dies. Fix: keep app servers stateless and push state to a shared store.

Now any server can serve any request; you add/remove servers freely, and a crash loses no session. This is exactly why StockVision puts JWT/session data in cookies + Redis rather than server memory — it can run behind N identical nodes.

Trace a request down the tier ladder: the load balancer round-robins to a stateless app server, which checks the cache first and only falls through to the database on a miss. Repeat a key and watch it serve from the warm cache instead:

Request flow — LB → stateless app → cache → DBtime cache hit ≈ O(1)space —

requestGET /u/7

Client

→

app ×3

012

→

Cache

→

1/11GET /u/7: the load balancer routes to app server 0 (round-robin). Servers are stateless, so any one can serve it.

cached keys = 0→ server = 0

app serversrequests (repeat a key → cache hit)

Load balancers

L4 (transport): routes by IP/port; fast, protocol-agnostic, no payload awareness.
L7 (application): routes by URL/header/cookie; enables path-based routing, TLS termination, and smarter health checks. Most web stacks use L7.

Algorithms: round-robin, least-connections, weighted, and consistent hashing (route the same key to the same node — vital for cache affinity). Always pair with health checks so dead nodes are pulled out.

The LB itself can be a SPOF

A single load balancer is a single point of failure. Run it in an active-passive (or active-active) pair with a floating IP / DNS failover, or use a managed LB that's redundant by design.

Don't forget the data tier

Scaling app servers is easy; the database is usually the real bottleneck. The standard ladder: cache reads → read replicas for read scale → shard when writes or storage exceed one box (covered in Databases at scale).

Design drills

Scaling is a ladder you climb under pressure. Practise naming the next rung and its cost.

Design drills: Scaling & load balancing0/4 done

Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.

Warm-up

One app server is at 80% CPU. What's the cheapest next move, and when does it stop working?

Core

Reads are the bottleneck at 50k QPS with a 100:1 read:write ratio. Walk the read-scaling ladder in order.

Core

The load balancer is now a single point of failure. How do you make the entry tier itself highly available?

Stretch

Writes now exceed what one primary can handle. What changes, and what do you lose?

Scaling, Load Balancing & Statelessness

Two ways to scale

Statelessness is the unlock

Load balancers

Don't forget the data tier

Design drills