@system_monarch: https://x.com/system_monarch/status/2057714149451497544
Summary
A comprehensive cheat sheet of 12 system design patterns for technical interviews, including signals, building blocks, and deep-dives for each pattern, based on 200+ interviews across top tech companies.
View Cached Full Text
Cached at: 05/22/26, 01:56 PM
The System Design Pattern Cheat Sheet I Wish I Had Earlier
Over the last 12 years, I’ve given 200+ interviews across Microsoft, Atlassian, Google, Uber, Salesforce, Amazon, Walmart and a bunch of other companies.
One habit kept saving me again and again. Before drawing any boxes, I tried to label the pattern.
Once the pattern is clear, the architecture gets simpler. Your trade-offs become sharper.
And your answers stop sounding like a random list of buzzwords.
Most candidates fail system design interviews not because they don’t know the technology, but because they jump straight into solutions without first understanding what kind of problem they’re actually solving. A news feed and a payment ledger live in completely different universes. The tools that work brilliantly for one will quietly destroy the other.
Here’s the system design pattern cheat sheet I wish I had created earlier. Every pattern has 3 sub sections, namely “Signals”, “Building blocks (used for solving that pattern, in general)”, “Deep-dives (for that pattern)”.
Step 0: Classify the problem in thirty seconds
When you hear the question and jot down the funtional requirements, then do this mentally before you say a single word:
-
What is higher: reads, writes, or both at the same time?
-
Do users need updates right now, or is near real-time enough?
-
Is the main value in serving user traffic, or in crunching data later?
-
Are we dealing with one user at a time, or massive fanout to millions?
-
Is correctness absolute (money, inventory) or best-effort (likes, recommendations)?
Once you answer these, you can usually map the problem to one of the 12 patterns below. The pattern then tells you which primitives matter, which trade-offs to bring up, and which buzzwords are noise.
Let’s go.
Pattern 1: Read-Heavy Systems
Examples that fall here: News feed, profile service, product catalog, URL shortener reads, content viewing, “Design Twitter timeline,” “Design Wikipedia,” “Design a CDN.”
Signals to listen for
-
The read-to-write ratio is wildly skewed. Think 100:1, 1000:1, or worse.
-
Data does not change every second for a given user. A profile updates once a week, but is viewed thousands of times.
-
Latency matters a lot. Consistency can be slightly relaxed — users will tolerate seeing a 2-second-old like count.
-
The same data is requested by many users (popular content) or by the same user repeatedly (their own feed).
Core building blocks
-
Cache layer: Redis or Memcached for hot keys, aggregates, and computed views. Multi-tier caching is common — browser cache, CDN, edge cache, application cache, database cache.
-
Read replicas: Extra database replicas for scaling reads. Writes go to the primary, reads fan out across replicas.
-
Search and secondary indexes: Elasticsearch or similar for flexible filtering and full-text queries.
-
Background writers / pre-computation workers: Services that pre-compute feeds, leaderboards, or aggregates so the read path is just a lookup.
-
CDN: For any static or semi-static content, push it as close to the user as possible.
Deep-dives and trade-offs that separate staff-level answers
-
Caching strategy is not “use Redis.” The real conversation is what you cache: individual objects, full pages, computed feeds, or query results. Each has a different invalidation story and a different memory cost.
-
Cache invalidation is the hard part. TTL-based expiry is simple but stale. Write-through caches are consistent but expensive. Write-behind is fast but loses data on crash. Cache-aside is the default for a reason but has the classic stampede problem when a hot key expires and a thousand requests rush to the database simultaneously. Talk about request coalescing or probabilistic early expiration.
-
Consistency vs availability when a replica lags. What does the user see right after they post something if the read goes to a stale replica? “Read-your-own-writes” consistency usually means routing the user’s own reads to the primary for a short window.
-
Hot key problem. A celebrity’s profile or a viral tweet can melt a single Redis shard. Solutions: replicate hot keys across multiple nodes, use local in-process caches in front of Redis, or pre-compute and serve from a static path.
-
Fan-out on write vs fan-out on read. A classic for Twitter-style feeds. Fan-out on write is great for normal users but disastrous for celebrities with 100M followers. Most real systems do hybrid: fan-out on write for normal users, fan-out on read for celebrities, merged at query time.
A senior candidate says “we’ll cache it.” A staff candidate explains what they cache, how it gets invalidated, what happens during a stampede, and how the hot key problem is handled.
Pattern 2: Write-Heavy or Event-Driven Systems
Examples that fall here: Logging platforms, metrics ingestion, order systems, clickstream collection, payment events, IoT telemetry, “Design a metrics system like Datadog,” “Design an analytics ingestion pipeline.”
Signals to listen for
-
Huge volume of writes per second — often millions per minute.
-
Most processing happens asynchronously after the user action completes. The user clicks, the event is captured, and the actual work happens later.
-
Reads are usually aggregations, dashboards, or downstream consumers, not single-row lookups.
-
Loss of a single event might be acceptable, or completely unacceptable, depending on whether it’s a tracking pixel or a payment.
Core building blocks
-
Append-only logs: Kafka, Pulsar, Kinesis. These are the backbone of high-throughput write systems.
-
Queues for reliable delivery: SQS, RabbitMQ. Different guarantees than streams. Queues are good for work distribution, streams are good for replayable event history.
-
Stateless consumer workers: Services that process events independently and can scale horizontally.
-
Idempotent storage writes: So retries are safe and don’t double-count.
-
Schema registry: When you have hundreds of producers and consumers, schema evolution becomes a real problem. Avro or Protobuf with a registry saves you.
Deep-dives and trade-offs that separate staff-level answers
-
At-least-once vs exactly-once vs at-most-once. Most systems are at-least-once with idempotent consumers. Exactly-once is possible (Kafka transactions) but expensive and constrained. Be specific about which one your system needs and why.
-
Ordering guarantees and partition keys. Kafka only guarantees order within a partition. If you partition by user_id, all events for that user are ordered, but events across users are not. Choosing the wrong partition key creates either hot partitions or weak ordering.
-
Back-pressure. What happens when consumers fall behind? Do producers slow down, do you spill to disk, do you drop events? A real system needs a strategy.
-
Dead letter queues and poison messages. A single malformed event can block an entire partition if your consumer crashes on it. DLQs let you skip and investigate later.
-
Replay and reprocessing. Can you re-run the last 7 days of events through a new consumer to backfill a feature? If yes, you have an event-sourced architecture. If no, you have a fragile pipeline.
-
Cost of retention. Storing 30 days of events in Kafka is expensive. Tiered storage (recent in Kafka, older in S3) is increasingly standard.
The interview trap: saying “Kafka” without explaining partition keys, consumer group semantics, or what happens when a consumer crashes mid-batch.
Pattern 3: Real-Time Fanout Systems
Examples that fall here: Chat (WhatsApp, Slack), notifications, live scores, stock tickers, collaborative documents (Google Docs, Figma), live video comments, multiplayer game state, “Design Slack,” “Design a real-time leaderboard.”
Signals to listen for
-
Many users need to see updates within seconds, often sub-second.
-
One event often fans out to thousands or millions of clients.
-
Long-lived connections or push channels are needed — this is fundamentally different from request-response.
-
Users come and go. Some are connected, some are offline and need to catch up.
Core building blocks
-
WebSockets, Server-Sent Events, or long polling. WebSockets for bidirectional, SSE for server-to-client only, long polling as a fallback when WebSockets are blocked.
-
Real-time gateway layer: A separate tier of stateful servers that hold open connections. Decoupled from your stateless business logic.
-
Pub/sub for fanout: Kafka topics, Redis pub/sub, or NATS to deliver messages from publishers to all relevant gateways.
-
Presence service: Tracks which user is connected to which gateway node so messages can be routed correctly.
-
Message history store: So users who reconnect can fetch what they missed.
Deep-dives and trade-offs that separate staff-level answers
-
Per-connection cost. Each WebSocket connection holds memory and a file descriptor. A single Node.js server might handle 50K-100K connections; tuned Go or Rust servers can hit a million. At Slack’s scale, this matters enormously.
-
Sharding users across gateways. Consistent hashing on user ID is common. But what happens when a gateway dies holding 100K connections? Those users all reconnect simultaneously — a thundering herd that can take down adjacent gateways.
-
Fanout amplification. A single message in a Slack channel with 10,000 members triggers 10,000 deliveries. Group messaging needs careful design to avoid quadratic blowups.
-
Catching up after disconnect. When a user reconnects after 30 seconds, do they need every message they missed? You need a message log per channel with cursor-based reads.
-
Graceful degradation. If your real-time layer is overloaded, can you fall back to 5-second polling? Many production systems do.
-
Ordering and deduplication across reconnects. A message might be delivered, the connection drops before ack, and the same message is delivered again. Idempotency on the client side matters.
The interview trap: drawing a single WebSocket server and not thinking about what happens when it dies, or assuming the same servers handle both connections and business logic.
Pattern 4: Batch and Analytics Systems
Examples that fall here: Reporting dashboards, recommendation pipelines, offline aggregations, data lake processing, BI dashboards, “Design YouTube’s view counter,” “Design a system to compute daily active users.”
Signals to listen for
-
Freshness can be minutes or hours late, not milliseconds. “Yesterday’s report at 9 AM” is acceptable.
-
Data volume is huge, often terabytes across many days.
-
Queries scan large parts of the dataset rather than fetching single rows.
-
Multiple consumers need the same data — finance, product analytics, ML training, executive dashboards.
Core building blocks
-
Data lake storage: S3, GCS, ADLS. Cheap, durable, schema-on-read.
-
ETL or ELT pipelines: Spark, Flink, Airflow, dbt, managed services like Glue or Dataflow.
-
Columnar query engines: BigQuery, Redshift, Snowflake, ClickHouse. Columnar storage is what makes scanning billions of rows feasible.
-
Pre-computed aggregates and materialized views: So dashboard queries are fast even on huge tables.
-
Workflow orchestrator: Airflow, Dagster, Prefect. Manages dependencies between jobs.
Deep-dives and trade-offs that separate staff-level answers
-
Lambda vs Kappa architecture. Lambda runs batch and stream in parallel and merges results. Kappa uses streams for everything and replays for reprocessing. Lambda is operationally complex but more flexible; Kappa is simpler but demands a mature streaming setup.
-
Late-arriving data. What happens when an event from yesterday arrives today because a mobile client was offline? Your daily aggregates are now wrong. Solutions: watermarking in Flink, allowing late-window updates, or accepting some inaccuracy with a “data quality” SLA.
-
Batch size vs frequency. Running every 5 minutes gives near-real-time freshness but high overhead. Running daily is efficient but stale. Micro-batching (every 15-30 minutes) is often the sweet spot.
-
Partitioning strategy. Tables partitioned by date, tenant, or region make queries fast and cheap. Wrong partitioning makes a single query scan terabytes when it should scan gigabytes.
-
Cost vs performance. A BigQuery query that scans 10 TB costs real money. Query optimization, materialized views, and partitioning are not just performance concerns — they’re cost concerns.
-
Separation of online and analytical paths. Your production database should never serve analytical queries. CDC (Debezium, Fivetran) replicates changes to the warehouse without impacting production.
The interview trap: mixing the analytical and serving paths, or not mentioning how data flows from operational systems into the warehouse.
Pattern 5: Search and Filtering Systems
Examples that fall here: Search bars, job search, product search, autocomplete, log search (Splunk, Datadog), “Design Google search at small scale,” “Design Yelp’s restaurant search,” “Design LinkedIn’s people search.”
Signals to listen for
-
Users query by text, tags, filters, or combinations.
-
Relevance and ranking matter as much as raw speed. The top 10 results are everything.
-
Writes flow into an index that needs to be kept up to date.
-
The query space is open-ended — you can’t pre-compute every possible query.
Core building blocks
-
Search index: Elasticsearch, OpenSearch, Solr. Built on Lucene’s inverted index.
-
Ingestion pipeline: Transforms records from the source of truth into search documents. Often async via Kafka.
-
Query layer: Builds filters, scoring, pagination, faceted aggregations.
-
Caches for hot queries: Popular searches get cached results to reduce load.
-
Ranking signals service: Combines text relevance with business signals (popularity, recency, personalization).
Deep-dives and trade-offs that separate staff-level answers
-
Indexing delay vs freshness. New data takes time to be searchable — usually seconds for Elasticsearch, longer at scale. If you need “search what was just posted 100ms ago,” you need a different architecture (in-memory side index merged at query time).
-
Sharding strategy. Shard by tenant, by document type, or by hash. Wrong sharding creates hot shards or scatter-gather query patterns that hurt latency.
-
Typo tolerance, stemming, and synonyms. Out of the box, search engines don’t know “iPhone 15 Pro” matches “iphone15pro” or “Apple’s newest phone.” Language processing matters.
-
Relevance tuning. Pure BM25 is rarely enough. You usually layer business signals (recency, popularity, click-through rate) and sometimes a learned-to-rank model on top.
-
Pagination beyond page 100. Offset pagination breaks at scale. Use cursor-based pagination (search_after in Elasticsearch).
-
Index size and cost. A 1 TB Elasticsearch cluster is expensive to keep hot. Frozen or cold tiers help for older data.
-
Reindexing. When you change the schema or analyzer, you need to rebuild the index. With billions of documents, this is a multi-day operation that needs a zero-downtime strategy (dual-write to old and new index, then cutover).
The interview trap: suggesting “we’ll use a SQL LIKE query” or not explaining how data actually gets into the search index.
Pattern 6: File and Media Storage Systems
Examples that fall here: Photo sharing, video upload, document storage, profile picture service, “Design Instagram,” “Design Dropbox,” “Design YouTube upload.”
Signals to listen for
-
Large binary objects, often megabytes to gigabytes.
-
Need for transformations: thumbnails, transcoding, format conversion, virus scanning.
-
Access patterns vary from very hot (today’s viral video) to very cold (years-old photos nobody views).
-
Global access requires CDN distribution.
Core building blocks
-
Object storage: S3, GCS, Azure Blob. Cheap, durable, virtually unlimited.
-
Metadata store: A database for file IDs, ownership, permissions, tags, versions. Usually relational or document.
-
Processing pipeline: Triggered on upload — thumbnails, transcoding, virus scanning, content moderation. Usually async via queue or event-driven.
-
CDN: CloudFront, Cloudflare, Akamai for global edge delivery.
-
Upload service: Often supports multipart and resumable uploads for large files.
Deep-dives and trade-offs that separate staff-level answers
-
Files in DB vs object store. Storing files in Postgres works for small files at small scale and is operationally simpler. Beyond ~1 MB per file or ~100K files, object storage wins on cost and operational sanity.
-
Direct-to-S3 uploads. Routing uploads through your application server is wasteful. Pre-signed URLs let clients upload directly to S3, saving bandwidth and reducing your server’s role to metadata management.
-
Resumable and multipart uploads. A 5 GB upload over flaky mobile networks will fail. Multipart uploads chunk the file; resumable protocols let the client pick up where it left off.
-
Storage tiers. S3 Standard for hot, Infrequent Access for warm, Glacier for cold. Lifecycle policies automatically move data based on age or access patterns. Big cost savings.
-
Access control and signed URLs. Public buckets are a security disaster. Time-limited signed URLs let you grant temporary access to specific files.
-
Deduplication. Storing the same file uploaded by 10,000 users 10,000 times is wasteful. Content-addressable storage (hash the content, store once) is how Dropbox and similar services save petabytes.
-
CDN cache invalidation. When a user updates their profile picture, the old version is cached at edge nodes worldwide. URL versioning (profile.jpg?v=2 or profile-{hash}.jpg) is simpler and cheaper than purging.
The interview trap: serving large files through your application servers, or not separating metadata from the blob.
Pattern 7: Workflow and Job Orchestration
Examples that fall here: Order pipelines, onboarding flows, multi-step approval systems, document processing pipelines, ETL workflows, “Design Amazon’s order fulfillment,” “Design an insurance claim processing system.”
Signals to listen for
-
Multi-step flows that span multiple services and often multiple days.
-
Each step can succeed, fail, or time out — and needs retries or compensating actions.
-
Business cares about the full journey, not just individual API calls.
-
You need to answer “where is order #12345 right now?” at any moment.
Core building blocks
-
Orchestrator / saga coordinator: Temporal, Cadence, AWS Step Functions, or a custom state machine.
-
Durable state per workflow: The state of each workflow instance persists across restarts, deploys, and failures.
-
Idempotent operations: Every step must be safe to retry.
-
Timeouts, retries, and compensating logic: If step 3 fails after step 2 succeeded, you need a way to undo step 2.
-
Observability: Per-workflow tracing, status dashboards, ability to inspect stuck workflows.
Deep-dives and trade-offs that separate staff-level answers
-
Orchestration vs choreography. Orchestration uses a central coordinator that tells each service what to do. Choreography uses events — each service reacts to events and emits new ones. Orchestration is easier to reason about and debug; choreography is more decoupled but can become a “where is this event going?” nightmare at scale.
-
Saga pattern. When a distributed transaction can’t be ACID across services, you use a saga: a series of local transactions, each with a compensating action. Booking a trip: book flight → book hotel → charge card. If the card fails, you cancel the hotel and flight.
-
Where state lives. In Temporal, the workflow code itself is the state. In a custom system, you might persist state in a database after each step. Either way, you must survive restarts.
-
Long-running workflows. Some workflows run for days or weeks (onboarding, approvals). Holding things in memory doesn’t work. Event-driven or durable timers are essential.
-
Handling stuck workflows. A workflow waiting for human approval might wait forever. You need timeouts, escalation paths, and a way for ops to manually intervene.
-
Replay and debugging. When something goes wrong, can you replay the workflow with the same inputs and see what happened? Temporal makes this a first-class feature.
The interview trap: not naming it as a saga or workflow problem, or implementing the orchestration as a chain of synchronous HTTP calls that breaks the moment any service is slow.
Pattern 8: Transactional and Financial Systems
Examples that fall here: Payment gateways, wallets, order payments, refunds, internal ledgers, “Design Stripe,” “Design PayPal’s wallet,” “Design a banking system.”
Signals to listen for
-
Money is moving or balances are changing.
-
You must never lose or duplicate a transaction. Ever.
-
Regulators, auditors, and dispute handling matter — every change must be traceable.
-
Customers will notice and complain about any inconsistency.
Core building blocks
-
Strongly consistent database for balances and ledgers: Postgres, MySQL with strong isolation, or specialized databases like TigerBeetle.
-
Double-entry ledger model: Every transaction is a pair of debits and credits. The system is auditable by construction.
-
Idempotent operations: Charge, refund, cancel — all must be safe to retry with idempotency keys.
-
Transactional outbox: A pattern where you write the business change and the event to publish in the same database transaction. Then a separate process publishes the event. Avoids the “wrote to DB but failed to publish event” problem.
-
Reconciliation jobs: Periodic batch jobs that compare internal state with external providers (Stripe, banks). Discrepancies are flagged for investigation.
Deep-dives and trade-offs that separate staff-level answers
-
ACID across services is hard. A payment that spans your wallet service, a fraud service, and a third-party gateway can’t be wrapped in a single database transaction. Sagas and idempotent retries are how you get reliability without distributed ACID.
-
Idempotency keys. What exactly do they protect? Usually they protect a specific operation (charge $50 from card X to merchant Y) from being executed twice if the client retries. The key includes the operation and its parameters, and the server stores the result of the first attempt.
-
Reconciliation as a first-class concern. Every payment system has a “the numbers don’t match” moment. Stripe says they charged 100 customers, your system says 99. Reconciliation jobs catch this. The deep question is what you do when you find a discrepancy — automatic correction is dangerous, manual investigation is slow.
-
Double-entry bookkeeping. Every money movement is two entries: debit one account, credit another. The sum of all entries must always be zero. This makes errors structurally detectable.
-
Eventual consistency in money. Some balances are eventually consistent — your available balance might be slightly stale while a transaction settles. The deep question is which views need to be strongly consistent (you can’t spend money you don’t have) vs eventually consistent (your monthly statement).
-
Audit log immutability. Ledgers are append-only. You never update a ledger entry. To correct an error, you add a reversing entry. This is non-negotiable for compliance.
-
Hot accounts. A merchant account receiving 10,000 payments per second becomes a hot row. Sharding ledger entries and computing balances at read time (or via rolled-up summaries) is how you scale.
The interview trap: saying “we’ll use a database transaction” without explaining what happens when the transaction spans multiple services, or not mentioning reconciliation at all.
Pattern 9: Recommendation and Personalization Systems
Examples that fall here: Home feeds, “people you may know,” product recommendations, watch next, ads ranking, “Design YouTube recommendations,” “Design Netflix’s home page.”
Signals to listen for
-
The output is a ranked list, not a single record.
-
There’s a feedback loop — user behavior (clicks, views, purchases) feeds back into the system.
-
Heavy offline computation feeds light, low-latency online serving.
-
Quality is subjective and measured via A/B tests, not unit tests.
Core building blocks
-
Event collection pipeline: Captures clicks, views, purchases, dwell time. Usually streams through Kafka into a warehouse.
-
Feature store: Stores precomputed features about users, items, and context. Online (low-latency lookup) and offline (for training) views must be consistent.
-
Candidate generation: Reduces the universe (millions of items) to a few hundred candidates likely to be relevant. Often uses embeddings and approximate nearest neighbor search.
-
Ranking model: Scores and orders candidates. Usually a learned model — gradient-boosted trees, neural networks, or transformer-based.
-
Online feature service: Sub-10ms lookups for user and item features at request time.
Deep-dives and trade-offs that separate staff-level answers
-
Two-stage architecture is the standard. Candidate generation followed by ranking. Trying to score millions of items per request is infeasible. Candidate generation narrows the field; ranking does the heavy lifting on a smaller set.
-
Offline vs online vs near-online training. Batch training is simple but stale. Online learning reacts quickly but is unstable. Most production systems retrain daily or hourly, with online updates for high-volume features.
-
Cold start. A new user has no history. A new item has no engagement. Fallback strategies: popularity-based for users, content-based for items, exploration bonuses to surface new items.
-
Exploration vs exploitation. Always showing the highest-predicted items means you never learn about new items. Multi-armed bandits, epsilon-greedy, or Thompson sampling inject controlled randomness.
-
Feedback loops and filter bubbles. A model trained on what users click reinforces what users already click. Diversification, recency boosting, and serendipity injection counter this.
-
Training-serving skew. Features computed differently offline vs online produce wrong predictions. A shared feature store solves this.
-
Latency budgets. You have maybe 100ms for the whole request. Feature lookup, candidate generation, ranking, and post-processing all fit in that. This drives architecture.
The interview trap: jumping straight into “we’ll train a deep neural network” without addressing cold start, latency, or how features actually flow from logs to serving.
Pattern 10: Events and Notification Systems
Examples that fall here: Email and push notifications, SMS alerts, in-app messages, digest emails, “Design a notification system for a social network,” “Design a transactional email system.”
Signals to listen for
-
The same event can trigger messages on many channels (email, push, SMS, in-app).
-
Timing matters: immediate, delayed, batched, or scheduled for a future time.
-
User preferences and suppression rules are complex — opt-outs, quiet hours, frequency caps.
-
Different message types have different reliability needs (OTP must arrive in 30 seconds; marketing can be best-effort).
Core building blocks
-
Event bus: Product systems publish events here, not emails directly.
-
Notification router: Maps events to templates, applies user preferences, decides which channels to use.
-
Channel-specific providers: SendGrid for email, FCM/APNs for push, Twilio for SMS, etc.
-
Preference store: Per-user, per-channel, per-message-type opt-in status, quiet hours, frequency caps.
-
Delivery tracking and outbox: Records what was sent, when, to whom, and the delivery status.
Deep-dives and trade-offs that separate staff-level answers
-
Product services should never send emails directly. They publish events. The notification service decides whether, how, and when to notify. This is the single most important architectural decision and is the difference between a manageable system and a nightmare.
-
Deduplication. A “new message” notification might be triggered by 5 different events in 2 seconds. Deduplication keys (user_id + notification_type + content_hash + time_bucket) collapse these into a single notification.
-
Batching and digests. Sending 50 individual notifications for 50 likes annoys users. Batching them into “you got 50 likes today” reduces fatigue and cost. The deep question is the trade-off between immediacy and noise.
-
Different SLAs per channel and message type. OTPs need at-least-once delivery in under 30 seconds with SMS fallback. Marketing emails can tolerate hours of delay. Your architecture needs separate pipelines or priority lanes.
-
Provider failover. SendGrid goes down. You need to fail over to SES or Mailgun. This requires templates that work across providers, careful suppression list management (you don’t want to email someone who unsubscribed via the failed provider), and bounce handling.
-
Tracking delivery, opens, and clicks. Webhooks from providers feed back into your system. This data is huge and noisy — design the ingestion pipeline carefully.
-
Quiet hours and time zones. Sending a notification at 3 AM in the user’s local time is a great way to lose them. Time zone handling per user is essential.
-
Idempotency on event consumption. The same event might be delivered twice from Kafka. The notification service must dedupe before sending.
The interview trap: drawing arrows from every product service directly to email/push providers, or not separating preferences from delivery logic.
Pattern 11: Geo and Location-Based Systems
Examples that fall here: Uber driver matching, Yelp nearby restaurants, food delivery routing, geofencing, ride-sharing surge pricing, “Design Uber,” “Design a ‘find friends nearby’ feature.”
Signals to listen for
-
Entities (users, drivers, places) have geographic coordinates that matter.
-
Queries are spatial: “find things near me,” “who is in this area,” “what’s the closest X.”
-
Some entities move continuously and need frequent location updates.
-
Distance, time decay, and direction often matter together.
Core building blocks
-
Spatial index: Geohashing, QuadTree, R-tree, or H3 (Uber’s hexagonal grid). Allows efficient “find nearby” queries without scanning every location.
-
Location update stream: High-throughput pipeline for moving entities to report position every few seconds.
-
Proximity matching service: Given a query point, find candidate entities within a radius.
-
Geo-partitioned data stores: Data sharded by region or cell so queries hit a small subset.
-
Routing and ETA service: Often a separate service that handles road graph routing, usually backed by OSRM, Valhalla, or commercial APIs.
Deep-dives and trade-offs that separate staff-level answers
-
Choice of spatial index. Geohashing is simple but has the edge-cell problem — two points 10 meters apart can land in different cells if they’re near a boundary. QuadTrees adapt to density but are harder to shard. H3 (hexagons) has uniform neighbor distances and is what Uber uses for a reason.
-
Update frequency vs battery vs accuracy. Drivers reporting location every 1 second is accurate but kills battery and creates massive ingest load. Every 30 seconds is cheap but stale. Adaptive update intervals (more frequent when moving fast, less when stationary) are common.
-
Hot cells. Manhattan at rush hour has 10,000 drivers in a few cells. Times Square on New Year’s Eve has 100,000 phones requesting nearby Ubers. You need per-cell sharding and load shedding.
-
Geo-fenced services. Surge pricing, regulatory rules, and supply/demand calculations are all per-region. Your architecture needs regional partitioning, not just sharding.
-
Matching algorithms. Nearest driver isn’t always best — you want lowest ETA, which depends on traffic, direction, and driver acceptance rate. This becomes an optimization problem, not a simple proximity query.
-
Eventual consistency on location. Two riders requesting at the same moment might both be “matched” to the same driver. You need a locking or reservation mechanism with a short TTL.
-
Stale data on disconnect. A driver’s phone dies. Their last location is now 5 minutes old. How long do you keep showing them as available?
The interview trap: doing a brute-force Haversine distance calculation over all drivers, or not addressing what happens at scale in dense cities.
Pattern 12: Distributed Coordination and Consensus
Examples that fall here: Distributed task schedulers, cron-at-scale, configuration propagation, distributed rate limiters, leader election, distributed locks, “Design a distributed job scheduler,” “Design a global rate limiter.”
Signals to listen for
-
Multiple nodes need to agree on something.
-
Exactly one node must perform a job (no duplicates, no misses).
-
Global state needs to propagate reliably to many nodes.
-
Failure modes include split brain, network partitions, and stale leaders.
Core building blocks
-
Consensus systems: ZooKeeper, etcd, Consul. Or services built on Raft/Paxos directly.
-
Lease-based ownership: A node holds a lease for a fixed time. If it doesn’t renew, the lease expires and someone else takes over.
-
Heartbeats: Workers signal liveness periodically.
-
Fencing tokens: Monotonically increasing tokens that prevent a stale leader from doing damage after a new leader has taken over.
-
Idempotent job execution: So a job that was started by a now-dead worker can be safely retried.
Deep-dives and trade-offs that separate staff-level answers
-
Distributed locks are tricky. A lock with no fencing is dangerous. If a node holds a lock, GC-pauses for 30 seconds, and the lock expires, someone else takes it. The original node wakes up still thinking it has the lock and writes corrupt data. Fencing tokens (every lock acquisition gets a number; the resource rejects writes with old numbers) are the fix.
-
Lease expiry and clock skew. Leases depend on time. Clocks drift. A lease of 10 seconds on one machine might be 12 seconds on another. Design margins for this.
-
Split brain. During a network partition, two nodes might both think they’re the leader. Consensus algorithms (Raft, Paxos) prevent this but require a majority quorum, meaning you need an odd number of nodes (usually 3 or 5).
-
Rate limiter design. A single global counter in Redis is simple but a bottleneck. Distributed token buckets per node with periodic reconciliation are more scalable but allow some over-limit. The trade-off is precision vs scalability.
-
Scheduler reliability. A naive cron-at-scale runs the same job multiple times if the scheduler is replicated for HA. You need leader election to ensure only one scheduler triggers each job, plus idempotency in the job itself as a safety net.
-
Configuration propagation. When a config changes, every node needs to know. Polling is simple but slow. Push via ZooKeeper watches is fast but creates load spikes. Hybrid approaches (push notification, then poll for details) are common.
-
Membership changes. Adding or removing nodes from a cluster is non-trivial. Consensus systems handle this carefully because a botched membership change can break quorum.
The interview trap: saying “we’ll use a distributed lock” without explaining what happens when the lock holder dies, or not mentioning consensus when the problem fundamentally requires it.
Conclusion
If you take just one thing from this cheat sheet, take this: in system design, clarity beats complexity.
When you label the pattern first, you stop guessing. You choose the right primitives, you explain the right trade-offs, and you sound like someone who has built systems, not just read about them.
The 30-second classifier at the top isn’t just an interview trick. It’s how senior engineers actually approach unknown systems in real life. The pattern tells you which primitives matter, which failure modes to worry about, and which conversations to have with your team.
A mid-level engineer picks tools. A senior engineer explains trade-offs. A staff or principal engineer classifies the pattern first and lets the constraints choose the architecture.
Bookmark this. Use it as a pre-flight checklist for every system design question you practice. Better yet, use it the next time you’re designing something at work — not just in an interview.
Also, feel free to reach out to me if you’re preparing for a switch, want to chat about interview preparation, or how to move to the next level in your career: https://topmate.io/puneet_patwari/
If you’re preparing for Senior to Principal-level system design interviews, I’ve put together 90+ fundamentals like this into a guide.
You can check it out here: puneetpatwari.in
Similar Articles
@tom_doerr: System design interview notes based on bestselling guides https://github.com/liquidslr/system-design-notes…
A GitHub repository containing comprehensive system design interview notes based on Alex Xu's bestselling books, covering topics like scaling, consistent hashing, and distributed systems.
@system_monarch: I have 12 years of experience and working as a Principal Engineer @Atlassian and I have seen concurrency scaring the he…
An Atlassian Principal Engineer highlights how concurrency concepts intimidate junior engineers and dominate backend interview fears.
@heynavtoor: Alex Xu's System Design Interview is the most recommended book in tech hiring. Volume 1: $39.99 on Amazon. Volume 2: $4…
An AWS engineer has created free, structured notes summarizing both volumes of Alex Xu's System Design Interview book, available on GitHub and via the website Pagefy.io.
jwasham/coding-interview-university
A comprehensive, multi-month study plan for software engineering interviews at major tech companies, originally created by John Washam and widely adopted by the developer community.
@LearnWithBrij: MASTER SYSTEM DESIGN SYSTEM DESIGN MASTER TREE │ ├── 1. Fundamentals │ ├── What is System Design │ ├── Functional Requi…
A comprehensive system design master tree covering fundamentals through real-world applications, including architecture patterns, databases, caching, messaging systems, API design, and deployment strategies. Intended as a structured learning guide for software engineers.