Design a Chat System (WhatsApp LLD)

The object design under the messaging giant: conversations, the message delivery state machine, local-first storage, and sync after offline.

LLDOODstate-machinesync

Scope it first

"The object design of a WhatsApp-like client and its server-side session layer: 1:1 and group conversations, message states (sent / delivered / read), offline users syncing on return, typing indicators. The distributed infrastructure is the HLD doc — here we design the classes that ride on it. OK?"

That split is itself the lesson: the HLD answered how a billion messages move; this answers what a message is — and chat is the rare LLD where the client is the harder half, because phones are offline-first databases with a UI.

Entities & relationships

Decisions to narrate as you draw:

  • Conversation is abstract; group vs direct are subclasses — groups add membership rules and admin powers; the message flow stays identical, so all delivery code works on the base type (polymorphism earning rent).
  • Message.id is a client-generated UUID — created before any network contact, so retries dedup server-side and the message has identity even while offline. The single most load-bearing field in the design (the WhatsApp HLD's dedup key, born here).
  • Content as an interface (text/media/location/reply), not a blob of nullable fields — adding "polls" is a new content type, not a schema scar. Media content holds a URL + local cache path, never bytes (blobs ride elsewhere).
  • Ordering by server-assigned seq per conversation, not by clock — phones' clocks lie (sequence numbers, not timestamps); the client sorts and gap-detects on seq.

The message state machine (the ticks)

Every sent message walks one state machine, and every transition has a trigger and an owner:

Pin the diagram next to the chat UI in your head: those states are the clock, the single tick, the double tick, and the blue double tick.

Walk a message through ack → deliver → open and watch the ticks advance. Then try to open (mark read) before it's even delivered — the machine refuses, the same way the UI can't show a blue tick for a message the recipient never received:

Message delivery (the ticks)time O(1) per eventspace O(states)
ackdeliveropenPendSentDelivRead
events:ackdeliveropen

1/4Start in Pend. Each event is handled by the current state — the State pattern moves this branching out of one giant switch and into the state objects themselves.

state = Pend
  • PENDING — exists only locally: rendered in the UI instantly (optimistic UI), sitting in the outbox (below). One grey clock.
  • SENT — the server has it durably; one tick. Note what this is not: no claim about the recipient.
  • DELIVERED — the recipient's device acked; two ticks. In groups: per-participant fan-in — the aggregate shows when all have it, the detail view shows each (Map<participant, DeliveryState> — the state machine runs per recipient).
  • READ — a product event (chat opened), not a transport event; gated by privacy settings. Keeping transport states and product states distinct in your model is a senior tell.

Status updates arrive as tiny system messages flowing back along the same pipes — the Observer pattern live: the Message mutates, the conversation view re-renders the tick.

The client is a database (outbox + sync)

The design's center of gravity. The phone must work in a tunnel:

  • Local-first store: every conversation's messages live in a local DB (SQLite); the UI reads only local data — network arrival writes to the store, and the store notifies the UI (Observer again). This one decision makes offline reading, instant search and fast startup fall out for free.
  • The outbox pattern: sending = append to local store (PENDING) + enqueue in a persistent outbox. A background SyncEngine drains it — retries with backoff across app restarts; the UUID makes every retry idempotent. Kill the app mid-send; the message still goes. (This is the microservices outbox living in your pocket.)
  • Sync on reconnect: per conversation, the client tracks the last seen seq (a cursor); reconnection asks "everything after cursor" — exactly the offline-inbox drain from the server's perspective, and the Google Drive change-log pattern from the client's. Gap in seqs mid-session → same fetch. One mechanism heals both.
  • Typing indicators are the anti-message: fire-and-forget, never stored, TTL'd (3 s) so a dropped "stopped typing" packet self-heals — explicitly contrasting them with messages shows you classify data by durability need (the Uber location argument).

Think it through like the interview

Think it through: Design a Chat System (LLD)LLD Classic — offline-first0/5 stages

PROBLEMDesign the classes for a WhatsApp-like client and its session layer: 1:1 and group chats, sent/delivered/read states, offline users syncing on return.

  1. 1

    Find the hard half

    Server or client — which side is the real design problem here, and why?

  2. 2

    Give the message identity before the network

    Who generates the message id — client or server? This one decision carries the design.

    unlocks after the stage above
  3. 3

    Model the ticks as a state machine

    Grey clock, ✓, ✓✓, blue ✓✓ — what are these, precisely?

    unlocks after the stage above
  4. 4

    Make the client a database

    The UI must never block on the network, yet no message may be lost. What architecture squares that?

    unlocks after the stage above
  5. 5

    Sync = cursors, not diffs

    Asha was offline for an hour. How does her phone catch up — and how does the same trick fix mid-session gaps?

    unlocks after the stage above

Walk a scenario

Asha (in a lift, no signal) types "running late" to the group: UUID m-91f3 created → local store append (PENDING) → outbox → UI shows it instantly, grey clock. Signal returns: SyncEngine drains — server acks, assigns seq 412 → SENT ✓. Server fans out (HLD's job); Rahul's phone acks → his entry in the delivery map flips; when the last member's device acks → DELIVERED ✓✓; reads trickle in per privacy settings. Meanwhile Asha's phone had missed seqs 410–411 (sent while she was offline) — the same reconnect pulled them by cursor, slotting them above hers by seq. Every arrow in that story is a class from the diagram doing one job.

Practice — level up

A chat system is ordered history plus fan-out delivery: append a message, push it to everyone in the room, and let anyone scroll back in order. These drills rehearse each move.

Practice ladder: Fan-out, ordering & history0/4 solved

Climb in order — every rung assumes the one above it. Solve on LeetCode, then tick it here; progress is saved on this device.

Warm-up — ordered, navigable history

  1. An ordered log you can move through — a single conversation's timeline.

Core — deliver to many, in time order

  1. Fan a post out to followers' feeds — the same push as delivering to room members.

  2. Fetch the value as of a timestamp — message history ordered by time.

Stretch — exactly-once under retries

  1. Suppress a duplicate inside a window — idempotent delivery when the client retries.