MongoDB & Redis · PrepDeck

The two most important NoSQL databases — MongoDB's document model for flexible schemas, Redis as an in-memory powerhouse for caching, queues, and pub/sub.

Starting from Zero — A Physical Intuition

Before discussing NoSQL designs, let's visualize MongoDB and Redis through physical analogies:

MongoDB as the Folder Binder (Document Store): In a relational database, a client's profile, their address history, and their list of skills are stored on separate shelves (tables) and joined when queried. MongoDB stores everything about a client in a single multi-pocket binder (a Document). When you pull the client's binder, you get their profile, address, and skills in one movement, without looking elsewhere.
Redis as the Desk Clipboard (In-Memory Key-Value): A relational database stores data on disk (the basement archive library). Retrieving it requires moving physical disk arms (walking downstairs). Redis stores data in RAM (a clipboard on your desk). It's incredibly fast (sub-millisecond), but clipboards have limited space (RAM is expensive) and if the power goes out, the clipboard is wiped clean unless you've made copies on paper (Redis persistence).

The NoSQL Landscape

NoSQL ("Not Only SQL") databases break from the relational table model. Different problems call for different data models:

Type	Database	Best for
Document	MongoDB, Firestore	Flexible schemas, nested data, fast iteration
Key-Value	Redis, DynamoDB	Caching, sessions, simple lookups
Wide-Column	Cassandra, HBase	Time-series, high write throughput at scale
Graph	Neo4j	Relationships, social networks, recommendations

This article covers MongoDB (document) and Redis (key-value + more) — the two you'll encounter most in interviews and production systems.

MongoDB — Document Database

Core Concepts

┌─────────────────────────────────────────────────────────────┐
│  SQL                    MongoDB equivalent                    │
│  Database       →       Database                             │
│  Table          →       Collection                           │
│  Row            →       Document (JSON/BSON)                 │
│  Column         →       Field                                │
│  Primary Key    →       _id (auto-generated ObjectId)        │
│  JOIN           →       $lookup (aggregation) or embedding   │
└─────────────────────────────────────────────────────────────┘

A document is a JSON-like record that can have nested objects and arrays:

{
  "_id": "ObjectId('64a1b2c3d4e5f6789012345')",
  "name": "Alice Johnson",
  "email": "alice@example.com",
  "address": {
    "street": "123 Main St",
    "city": "San Francisco",
    "country": "US"
  },
  "skills": ["Python", "JavaScript", "MongoDB"],
  "experience": [
    { "company": "Google", "years": 3, "role": "SDE-2" },
    { "company": "Startup", "years": 2, "role": "SDE-3" }
  ],
  "createdAt": "ISODate('2024-01-15T10:30:00Z')"
}

Why documents?

Matches how you actually think about data (user "has" an address)
No expensive JOINs for co-located data — one read gets the whole entity
Schema flexibility — add fields without migrations
Natural fit for APIs that consume/produce JSON

CRUD Operations

// MongoDB Node.js driver
const { MongoClient, ObjectId } = require("mongodb");
const client = new MongoClient(process.env.MONGO_URI);
const db = client.db("prepdeck");
const users = db.collection("users");

// INSERT
const result = await users.insertOne({
  name: "Alice",
  email: "alice@example.com",
  skills: ["JavaScript"],
  createdAt: new Date(),
});
console.log(result.insertedId);  // ObjectId

await users.insertMany([
  { name: "Bob", email: "bob@example.com" },
  { name: "Carol", email: "carol@example.com" },
]);

// READ
const user = await users.findOne({ email: "alice@example.com" });
const allUsers = await users.find({ isActive: true }).toArray();
const firstFive = await users.find({}).limit(5).skip(10).sort({ name: 1 }).toArray();

// Query operators
const result = await users.find({
  age: { $gte: 18, $lt: 30 },         // $gte, $gt, $lt, $lte, $ne
  skills: { $in: ["JavaScript", "Python"] }, // skill is in the array
  email: { $regex: /@gmail\.com$/i },  // regex match
  "address.city": "San Francisco",     // dot notation for nested fields
}).toArray();

// UPDATE
await users.updateOne(
  { email: "alice@example.com" },      // filter
  {
    $set: { name: "Alice Smith" },     // set specific fields
    $push: { skills: "TypeScript" },  // add to array
    $inc: { loginCount: 1 },          // increment
    $currentDate: { updatedAt: true },
  }
);

// Upsert — insert if not found
await users.updateOne(
  { email: "new@example.com" },
  { $setOnInsert: { name: "New User", createdAt: new Date() } },
  { upsert: true }
);

// DELETE
await users.deleteOne({ email: "alice@example.com" });
await users.deleteMany({ isActive: false, createdAt: { $lt: new Date("2020-01-01") } });

Indexes in MongoDB

// Single field
await users.createIndex({ email: 1 }, { unique: true });

// Compound index (order matters — matches leftmost prefix)
await posts.createIndex({ authorId: 1, createdAt: -1 });
// Serves queries on: authorId alone, authorId + createdAt
// Does NOT serve: createdAt alone

// Text index for search
await posts.createIndex({ title: "text", content: "text" });
const searchResults = await posts.find({ $text: { $search: "react tutorial" } }).toArray();

// Check index usage
await users.find({ email: "a@b.com" }).explain("executionStats");
// Look for: "IXSCAN" (good) vs "COLLSCAN" (bad — full collection scan)

Aggregation Pipeline

The aggregation pipeline transforms documents through stages (like Unix pipes):

const result = await db.collection("orders").aggregate([
  // Stage 1: Filter
  { $match: { status: "completed", createdAt: { $gte: new Date("2024-01-01") } } },

  // Stage 2: Join with users collection
  { $lookup: {
    from: "users",
    localField: "userId",
    foreignField: "_id",
    as: "user"
  }},
  { $unwind: "$user" },  // Flatten array from $lookup

  // Stage 3: Group and aggregate
  { $group: {
    _id: "$user.country",
    totalRevenue: { $sum: "$amount" },
    orderCount: { $count: {} },
    avgOrderValue: { $avg: "$amount" },
  }},

  // Stage 4: Filter groups
  { $match: { totalRevenue: { $gt: 10000 } } },

  // Stage 5: Sort
  { $sort: { totalRevenue: -1 } },

  // Stage 6: Limit
  { $limit: 10 },

  // Stage 7: Project (shape the output)
  { $project: {
    _id: 0,
    country: "$_id",
    totalRevenue: { $round: ["$totalRevenue", 2] },
    orderCount: 1,
    avgOrderValue: { $round: ["$avgOrderValue", 2] },
  }},
]).toArray();

Data Modeling — Embed vs Reference

The biggest MongoDB design decision:

// EMBED: when you always need the data together, or sub-documents are small
// User with embedded address and preferences
{
  _id: ObjectId("..."),
  name: "Alice",
  address: { street: "123 Main", city: "SF" },    // always fetched with user
  preferences: { theme: "dark", language: "en" }  // small, user-specific
}

// REFERENCE: when data is large, shared, or queried independently
// Post with author reference (not embedded user document)
{
  _id: ObjectId("..."),
  title: "React Tutorial",
  authorId: ObjectId("user-id-here"),  // reference — don't copy user data
  tags: ["react", "javascript"],
}
// Fetch: two queries, or $lookup in aggregation

// Rule of thumb:
// "Contains" relationship → embed (user contains address)
// "References" relationship → reference (post references author)
// High cardinality (user has thousands of posts) → reference
// Low cardinality, always needed together → embed

Redis — In-Memory Data Store

Redis (Remote Dictionary Server) is an in-memory key-value store. It's absurdly fast — sub-millisecond latency — because it stores everything in RAM.

Use Redis for:
✅ Caching (reduce DB load)
✅ Session storage
✅ Rate limiting
✅ Pub/Sub messaging
✅ Distributed locks
✅ Job queues (with Redis Queue / BullMQ)
✅ Leaderboards (sorted sets)
✅ Real-time analytics counters

Data Types

# STRING — the basic type (values: strings, integers, binary)
SET user:1:name "Alice"
GET user:1:name                   # "Alice"
INCR page:views                   # atomic increment (thread-safe counter)
INCRBY page:views 10
SETNX lock:resource "worker-1"    # Set if Not eXists (primitive distributed lock)
SETEX session:abc123 3600 "user:1" # Set with TTL (3600 seconds = 1 hour)
TTL session:abc123                 # time remaining

# HASH — like a mini-object (perfect for user sessions, caching objects)
HSET user:1 name "Alice" email "alice@example.com" role "admin"
HGET user:1 name           # "Alice"
HGETALL user:1             # all fields
HMGET user:1 name email    # multiple fields
HINCRBY user:1 loginCount 1

# LIST — ordered, allows duplicates (queues, recent items)
RPUSH notifications:user:1 "New message" "Friend request"  # push to right
LPUSH notifications:user:1 "Urgent alert"                  # push to left
LRANGE notifications:user:1 0 -1                           # all items
LPOP notifications:user:1                                  # pop from left (queue)
BLPOP notifications:user:1 30  # blocking pop (wait up to 30s — great for workers)

# SET — unordered, unique values (tags, unique visitors, "who liked this")
SADD post:1:likes user:1 user:2 user:3
SISMEMBER post:1:likes user:1   # is user:1 in the set? (1=yes, 0=no)
SCARD post:1:likes              # count
SUNION post:1:likes post:2:likes  # union of two sets

# SORTED SET — unique members with a score (leaderboards, rate limiting)
ZADD leaderboard 1500 "alice" 2300 "bob" 800 "carol"
ZRANK leaderboard "alice"       # rank (0-indexed, ascending)
ZREVRANK leaderboard "bob"      # rank (0-indexed, descending = 1st place)
ZRANGE leaderboard 0 2 WITHSCORES    # bottom 3 with scores
ZREVRANGE leaderboard 0 2 WITHSCORES # top 3 with scores
ZINCRBY leaderboard 100 "alice"      # add to alice's score

Caching Pattern

// Cache-aside pattern (most common)
async function getUser(id) {
  const cacheKey = `user:${id}`;

  // 1. Check cache
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  // 2. Cache miss — fetch from DB
  const user = await db.query("SELECT * FROM users WHERE id = $1", [id]);
  if (!user) return null;

  // 3. Store in cache with TTL (avoid stale data forever)
  await redis.setex(cacheKey, 300, JSON.stringify(user));  // 5 minutes

  return user;
}

// Cache invalidation — delete or update cache when data changes
async function updateUser(id, updates) {
  await db.query("UPDATE users SET ...", [id, ...updates]);
  await redis.del(`user:${id}`);  // invalidate cache
}

Rate Limiting

// Sliding window rate limiter using sorted sets
async function isRateLimited(userId, limit = 100, windowSeconds = 3600) {
  const now = Date.now();
  const windowStart = now - windowSeconds * 1000;
  const key = `ratelimit:${userId}`;

  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart);    // remove old entries
  pipeline.zadd(key, now, `${now}`);                 // add current request
  pipeline.zcard(key);                               // count requests in window
  pipeline.expire(key, windowSeconds);               // set TTL
  const results = await pipeline.exec();

  const requestCount = results[2][1];
  return requestCount > limit;
}

// Simple fixed window with strings (faster, less accurate)
async function simpleRateLimit(key, limit, window) {
  const count = await redis.incr(key);
  if (count === 1) await redis.expire(key, window);
  return count > limit;
}

Pub/Sub

// Publisher (one process)
const publisher = redis.duplicate();
await publisher.publish("notifications", JSON.stringify({
  type: "new_message",
  userId: 1,
  message: "Hello!"
}));

// Subscriber (another process)
const subscriber = redis.duplicate();
await subscriber.subscribe("notifications");
subscriber.on("message", (channel, message) => {
  const data = JSON.parse(message);
  console.log(`Received on ${channel}:`, data);
});

// Pattern subscribe (wildcard channels)
await subscriber.psubscribe("notifications:*");

Distributed Locks (Redlock)

// Simple lock (single Redis node — for production use Redlock algorithm)
async function acquireLock(resource, ttl = 30000) {
  const lockKey = `lock:${resource}`;
  const token = crypto.randomUUID();

  // NX = set only if not exists, PX = TTL in milliseconds
  const result = await redis.set(lockKey, token, "NX", "PX", ttl);
  return result === "OK" ? token : null;
}

async function releaseLock(resource, token) {
  const lockKey = `lock:${resource}`;
  // Use Lua script for atomic check-and-delete (prevent releasing someone else's lock)
  const script = `
    if redis.call("get", KEYS[1]) == ARGV[1] then
      return redis.call("del", KEYS[1])
    else
      return 0
    end
  `;
  return redis.eval(script, 1, lockKey, token);
}

Redis Persistence

# RDB (snapshots) — default; compact, good for backups
# Saves a snapshot of the dataset at intervals
save 900 1      # save if >= 1 key changed in 900 seconds
save 300 10     # save if >= 10 keys changed in 300 seconds

# AOF (Append Only File) — logs every write operation
appendonly yes
appendfsync everysec  # sync every second (good balance)
# or: always (safest, slowest), no (fastest, can lose data)

# Both: use RDB for backups + AOF for durability (recommended for production)

Common Interview Questions

Design drills

NoSQL is modeling around the access pattern, not normal forms. Whiteboard each before revealing the checklist.

Design drills: NoSQL data modeling0/5 done

Whiteboard each one out loud for 5–10 minutes before you reveal what a strong answer covers — the gap between your sketch and the checklist is your study list. Progress is saved on this device.

Warm-up

Model a blog (users, posts, comments) in MongoDB. Embed or reference each relationship?

Core

Pick the Redis structure for: a leaderboard, per-IP rate limiting, a session cache, a work queue.

Core

When do you reach for MongoDB over Postgres — honestly?

Stretch

Redis is in-memory. How do you handle persistence and running out of memory?

Stretch

Design a cache-aside layer with Redis in front of Postgres, with stampede protection.

Think it through like the interview

Don't just choose NoSQL randomly — derive document schemas based on query patterns and limits.

Think it through: MongoDB Schema Design: Embed vs ReferenceNoSQL Architecture0/3 stages

PROBLEMYou are building an e-commerce platform. You need to model `Users`, `Addresses`, and `Orders`. Design the MongoDB schema and decide whether to embed or reference each relationship.

1
Evaluate the User-Address relationship
“A user typically has 1 to 3 addresses. Should you embed address data inside the User document, or keep a separate Addresses collection? Why?”
2
Evaluate the User-Orders relationship
“A user can make thousands of orders over years. Should you embed orders inside an array inside the User document? Consider document size boundaries.”
unlocks after the stage above
3
Design for high write scale
“Suppose orders are frequently updated and read independently. How does referencing orders help write throughput compared to embedding?”
unlocks after the stage above

Interactive Quiz

Check yourself0/3 answered

Practice

MongoDB: Model a social media app — users, posts, comments, likes. Decide what to embed vs reference. Write aggregation to find the top 10 most-liked posts from the last 7 days.
Redis Caching: Implement cache-aside for a product catalog endpoint. Handle cache stampede with a "dog-pile" prevention mechanism.
Leaderboard: Use Redis sorted sets to implement a gaming leaderboard — add score, get top 10, get a player's rank, get players around a given rank.
Rate Limiter: Implement a sliding window rate limiter using Redis sorted sets that allows 100 requests per minute per IP.

Next: Indexing & Optimization — making queries fast.