System Design Patterns & Principles

A comprehensive guide to system design patterns, principles, and best practices for building scalable, reliable, and maintainable systems.

Core Concepts

Performance Metrics

Latency: Time to complete a single operation
Throughput: Number of operations per unit time
Bandwidth: Maximum data transfer rate
Availability: Percentage of uptime (99.9% = 43.8 minutes downtime/month)

Availability Tiers:
- 99%     = 7.2 hours downtime/month
- 99.9%   = 43.8 minutes downtime/month
- 99.99%  = 4.38 minutes downtime/month
- 99.999% = 26.3 seconds downtime/month

Latency Numbers

L1 cache reference:              0.5 ns
Branch mispredict:               5 ns
L2 cache reference:              7 ns
Mutex lock/unlock:              25 ns
Main memory reference:         100 ns
Compress 1KB with Snappy:    3,000 ns
Send 1KB over 1 Gbps:       10,000 ns
Read 4KB from SSD:         150,000 ns
Read 1MB sequentially:     250,000 ns
Round trip in datacenter:  500,000 ns
Read 1MB from SSD:       1,000,000 ns
Disk seek:              10,000,000 ns
Read 1MB from disk:     20,000,000 ns
Send packet US->EU:    150,000,000 ns

System Design Principles

KISS - Keep It Simple, Stupid
DRY - Don't Repeat Yourself
YAGNI - You Aren't Gonna Need It
Separation of Concerns
Single Responsibility Principle
Fail Fast
Idempotency
Immutability
Eventual Consistency
Defense in Depth

Scalability

Vertical Scaling (Scale Up)

Pros:
- Simple to implement
- No application changes needed
- Maintains data consistency

Cons:
- Hardware limits
- Single point of failure
- Expensive
- Downtime during upgrade

Use Cases:
- Legacy applications
- Databases (up to a point)
- Applications with tight coupling

Horizontal Scaling (Scale Out)

Pros:
- Nearly unlimited scaling
- Better fault tolerance
- Cost-effective (commodity hardware)
- No downtime

Cons:
- Complex architecture
- Data consistency challenges
- Network overhead
- State management

Use Cases:
- Web applications
- Stateless services
- Distributed systems

Load Distribution Patterns

Round Robin:
  - Simple rotation through servers
  - Equal distribution
  - No server state awareness

Least Connections:
  - Route to server with fewest active connections
  - Better for long-lived connections
  - Requires connection tracking

IP Hash:
  - Route based on client IP hash
  - Session persistence
  - Uneven distribution possible

Weighted:
  - Distribute based on server capacity
  - Handles heterogeneous servers
  - Dynamic weight adjustment

Least Response Time:
  - Route to fastest responding server
  - Optimal performance
  - Requires health checking

Database Design

SQL vs NoSQL

SQL (Relational):
  Best For:
    - Complex queries
    - ACID transactions
    - Structured data
    - Relationships
  Examples:
    - PostgreSQL
    - MySQL
    - Oracle

NoSQL:
  Document Store (MongoDB, CouchDB):
    - Flexible schema
    - JSON-like documents
    - Good for hierarchical data

  Key-Value (Redis, DynamoDB):
    - Simple lookups
    - High performance
    - Caching

  Column-Family (Cassandra, HBase):
    - Write-heavy workloads
    - Time-series data
    - Wide rows

  Graph (Neo4j, Amazon Neptune):
    - Relationship-heavy data
    - Social networks
    - Recommendation engines

Database Sharding

Vertical Sharding (by feature):
  Users DB:
    - users table
    - profiles table
    - authentication table

  Products DB:
    - products table
    - inventory table
    - categories table

Horizontal Sharding (by data):
  User Shard 1:
    - user_id 1-1000000

  User Shard 2:
    - user_id 1000001-2000000

  User Shard 3:
    - user_id 2000001-3000000

Sharding Strategies:
  Hash-based:
    - shard = hash(user_id) % num_shards
    - Even distribution
    - Hard to add shards

  Range-based:
    - shard_1: 0-999999
    - shard_2: 1000000-1999999
    - Easy to add shards
    - Potential hotspots

  Directory-based:
    - Lookup table maps keys to shards
    - Flexible
    - Additional lookup overhead

Replication Patterns

Master-Slave (Primary-Replica):
  Setup:
    - One master handles writes
    - Multiple slaves handle reads
    - Async replication

  Pros:
    - Read scalability
    - Simple setup

  Cons:
    - Write bottleneck
    - Replication lag
    - Single point of failure

Master-Master (Multi-Master):
  Setup:
    - Multiple masters accept writes
    - Bidirectional replication

  Pros:
    - Write scalability
    - High availability

  Cons:
    - Conflict resolution
    - Complex setup
    - Eventual consistency

Peer-to-Peer:
  Setup:
    - All nodes are equal
    - Distributed consensus

  Pros:
    - No single point of failure
    - High availability

  Cons:
    - Complex
    - Network overhead

Caching Strategies

Cache Patterns

Cache-Aside (Lazy Loading):
  Flow:
    1. Check cache
    2. If miss, read from DB
    3. Write to cache
    4. Return data

  Pros:
    - Only cache what's needed
    - Cache failures don't crash system

  Cons:
    - Initial request slow
    - Potential cache stampede

Write-Through:
  Flow:
    1. Write to cache
    2. Write to database
    3. Return success

  Pros:
    - Cache always consistent
    - Read performance

  Cons:
    - Write latency
    - Unused data cached

Write-Behind (Write-Back):
  Flow:
    1. Write to cache
    2. Return success
    3. Async write to DB

  Pros:
    - Fast writes
    - Batch DB writes

  Cons:
    - Data loss risk
    - Complex implementation

Refresh-Ahead:
  Flow:
    1. Predict needed data
    2. Pre-load into cache
    3. Serve from cache

  Pros:
    - Always fast reads
    - Reduced latency

  Cons:
    - Prediction complexity
    - Wasted cache on wrong predictions

Cache Eviction Policies

LRU (Least Recently Used):
  - Remove least recently accessed
  - Good for general use
  - O(1) with HashMap + LinkedList

LFU (Least Frequently Used):
  - Remove least frequently accessed
  - Good for varied access patterns
  - More complex tracking

FIFO (First In First Out):
  - Remove oldest entry
  - Simple implementation
  - May remove hot data

TTL (Time To Live):
  - Remove after expiration
  - Good for time-sensitive data
  - Prevents stale data

Caching Levels

Client-Side:
  - Browser cache
  - Mobile app cache
  - Fast access
  - Limited control

CDN (Content Delivery Network):
  - Geographic distribution
  - Static content
  - Reduced latency

Application Cache:
  - Redis, Memcached
  - Shared across instances
  - Flexible data structures

Database Cache:
  - Query cache
  - Buffer pool
  - Automatic management

Load Balancing

Layer 4 vs Layer 7

Layer 4 (Transport):
  - TCP/UDP level
  - IP + Port routing
  - Fast performance
  - Simple routing
  - No content awareness

  Example: HAProxy, AWS NLB

Layer 7 (Application):
  - HTTP/HTTPS level
  - Content-based routing
  - URL path routing
  - Header inspection
  - SSL termination

  Example: Nginx, AWS ALB

Load Balancer Algorithms

Round Robin:
  def get_server(servers, counter):
    return servers[counter % len(servers)]

Weighted Round Robin:
  def get_server(servers_with_weights):
    total = sum(weight for _, weight in servers_with_weights)
    rand = random(0, total)
    for server, weight in servers_with_weights:
      if rand < weight:
        return server
      rand -= weight

Least Connections:
  def get_server(servers):
    return min(servers, key=lambda s: s.active_connections)

IP Hash:
  def get_server(servers, client_ip):
    hash_value = hash(client_ip)
    return servers[hash_value % len(servers)]

Health Checks

Active Health Checks:
  HTTP:
    - GET /health
    - Expect 200 OK
    - Interval: 10s
    - Timeout: 5s
    - Unhealthy threshold: 3
    - Healthy threshold: 2

  TCP:
    - Establish connection
    - Check port open
    - Fast but limited

Passive Health Checks:
  - Monitor actual traffic
  - Error rate threshold
  - Response time threshold
  - Circuit breaker pattern

API Design

REST API Best Practices

HTTP Methods:
  GET:    Retrieve resource(s)
  POST:   Create resource
  PUT:    Update/Replace resource
  PATCH:  Partial update
  DELETE: Delete resource

Status Codes:
  200: OK
  201: Created
  204: No Content
  400: Bad Request
  401: Unauthorized
  403: Forbidden
  404: Not Found
  500: Internal Server Error
  503: Service Unavailable

URL Design:
  Good:
    GET    /users
    GET    /users/123
    POST   /users
    PUT    /users/123
    DELETE /users/123
    GET    /users/123/orders

  Bad:
    GET /getUsers
    GET /user/123/delete
    POST /createUser

API Versioning

URI Versioning:
  - /api/v1/users
  - /api/v2/users
  - Simple and clear
  - URL pollution

Header Versioning:
  - GET /api/users
  - Accept: application/vnd.company.v1+json
  - Clean URLs
  - Less visible

Query Parameter:
  - /api/users?version=1
  - Backward compatible
  - Easy to default

Rate Limiting

Algorithms:

Token Bucket:
  class TokenBucket:
    def __init__(self, capacity, refill_rate):
      self.capacity = capacity
      self.tokens = capacity
      self.refill_rate = refill_rate

    def allow_request(self):
      self.refill()
      if self.tokens >= 1:
        self.tokens -= 1
        return True
      return False

Leaky Bucket:
  - Fixed output rate
  - Queue incoming requests
  - Drop on overflow

Fixed Window:
  - Count requests per time window
  - Simple but allows bursts
  - Reset at window boundary

Sliding Window:
  - Weighted count across windows
  - Smooth rate limiting
  - More complex

Headers:
  X-RateLimit-Limit: 100
  X-RateLimit-Remaining: 87
  X-RateLimit-Reset: 1640000000

Microservices

Microservices Patterns

Service Decomposition:
  By Business Capability:
    - Order Service
    - Payment Service
    - Inventory Service
    - User Service

  By Subdomain (DDD):
    - Bounded contexts
    - Domain models
    - Ubiquitous language

Communication:
  Synchronous:
    - REST API
    - gRPC
    - Direct calls
    - Tight coupling

  Asynchronous:
    - Message Queue
    - Event Bus
    - Pub/Sub
    - Loose coupling

Service Discovery:
  Client-Side:
    - Eureka, Consul
    - Client queries registry
    - Client-side load balancing

  Server-Side:
    - AWS ELB, Kubernetes Service
    - Load balancer queries registry
    - Transparent to client

Saga Pattern (Distributed Transactions)

Choreography:
  Order Service:
    → Creates order
    → Publishes OrderCreated event

  Payment Service:
    → Listens OrderCreated
    → Processes payment
    → Publishes PaymentCompleted

  Inventory Service:
    → Listens PaymentCompleted
    → Reserves items
    → Publishes ItemsReserved

Orchestration:
  Order Orchestrator:
    1. Call Order Service → Create order
    2. Call Payment Service → Process payment
    3. Call Inventory Service → Reserve items
    4. If any fails → Rollback all

  Compensation:
    - RefundPayment
    - ReleaseInventory
    - CancelOrder

Circuit Breaker Pattern

class CircuitBreaker:
    CLOSED = 'closed'      # Normal operation
    OPEN = 'open'          # Failing, reject requests
    HALF_OPEN = 'half_open'  # Testing recovery

    def __init__(self, failure_threshold=5, timeout=60):
        self.state = self.CLOSED
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.last_failure_time = None

    def call(self, func):
        if self.state == self.OPEN:
            if time.time() - self.last_failure_time >= self.timeout:
                self.state = self.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")

        try:
            result = func()
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise e

    def on_success(self):
        self.failure_count = 0
        self.state = self.CLOSED

    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = self.OPEN

Message Queues

Messaging Patterns

Point-to-Point (Queue):
  - Single consumer per message
  - Message deleted after consumption
  - Load balancing across consumers
  - Use: Task distribution

Publish-Subscribe (Topic):
  - Multiple consumers per message
  - Message retained
  - Broadcast to all subscribers
  - Use: Event notification

Message Ordering:
  FIFO:
    - First in, first out
    - Ordered processing
    - Single consumer

  Priority:
    - High priority first
    - SLA-based processing
    - Multiple queues

Dead Letter Queue:
  - Failed messages
  - Poison messages
  - Manual intervention
  - Debug & retry

Popular Message Queues

RabbitMQ:
  - AMQP protocol
  - Flexible routing
  - Complex topologies
  - Good for enterprise

Apache Kafka:
  - High throughput
  - Event streaming
  - Log aggregation
  - Good for big data

AWS SQS:
  - Fully managed
  - Highly scalable
  - Simple setup
  - Good for AWS ecosystem

Redis Pub/Sub:
  - In-memory
  - Very fast
  - Simple pub/sub
  - No persistence guarantee

Consistency & CAP Theorem

CAP Theorem

Pick 2 of 3:

Consistency (C):
  - All nodes see same data
  - Reads return latest write
  - Strong consistency

Availability (A):
  - Every request gets response
  - No guarantee it's latest
  - System always operational

Partition Tolerance (P):
  - System works despite network splits
  - Required in distributed systems
  - Must choose between C and A

Examples:
  CP Systems:
    - MongoDB (with default settings)
    - HBase
    - Redis (in cluster mode)
    - Prioritize consistency

  AP Systems:
    - Cassandra
    - DynamoDB
    - CouchDB
    - Prioritize availability

  CA Systems:
    - Traditional RDBMS
    - Only in single node
    - Not partition tolerant

Consistency Models

Strong Consistency:
  - Immediate consistency
  - Linearizability
  - Slow but accurate
  - Use: Banking, inventory

Eventual Consistency:
  - Eventually consistent
  - Fast but stale reads possible
  - Use: Social media feeds, analytics

Causal Consistency:
  - Causally related operations ordered
  - Others may be out of order
  - Middle ground

Read-Your-Writes:
  - User sees own writes
  - Others may not
  - Good for user experience

Design Patterns

Common Architectural Patterns

Layered Architecture:
  Presentation Layer (UI)
  ↓
  Business Logic Layer
  ↓
  Data Access Layer
  ↓
  Database

  Pros: Simple, testable
  Cons: Monolithic, scaling issues

Event-Driven Architecture:
  Producer → Event Bus → Consumer(s)

  Pros: Loose coupling, scalable
  Cons: Complex, debugging hard

CQRS (Command Query Responsibility Segregation):
  Commands (Write):
    → Write Model
    → Write Database

  Queries (Read):
    → Read Model
    → Read Database (optimized)

  Pros: Optimized reads/writes
  Cons: Complexity, eventual consistency

Serverless:
  API Gateway → Lambda Functions → Services

  Pros: Auto-scaling, pay-per-use
  Cons: Cold starts, vendor lock-in

Database Patterns

Database per Service:
  - Each microservice owns database
  - Data isolation
  - Independent scaling
  - Distributed transactions complex

Shared Database:
  - Multiple services share DB
  - Simple transactions
  - Tight coupling
  - Schema change coordination

Event Sourcing:
  - Store events, not state
  - Full audit trail
  - Replay events
  - Complex queries

Materialized View:
  - Pre-computed results
  - Fast queries
  - Stale data
  - Update overhead

Common System Designs

URL Shortener

Requirements:
  - Shorten URLs
  - Redirect to original URL
  - Analytics (optional)
  - Custom aliases
  - Expiration

Components:
  API Server:
    - POST /shorten
    - GET /{short_code}

  Database:
    - url_mappings table
      - id: BIGINT
      - short_code: VARCHAR(7)
      - original_url: TEXT
      - created_at: TIMESTAMP
      - expires_at: TIMESTAMP

  Cache (Redis):
    - Key: short_code
    - Value: original_url
    - TTL: 24 hours

Algorithm:
  Base62 Encoding:
    charset = "0-9a-zA-Z"
    id = auto_increment
    short_code = base62_encode(id)

  MD5 Hash:
    hash = md5(url + timestamp)
    short_code = hash[:7]
    handle collisions

Scaling:
  - Database sharding by short_code
  - Cache layer (Redis)
  - CDN for redirects
  - Rate limiting

Design Twitter

Requirements:
  - Post tweets (280 chars)
  - Follow users
  - Timeline (home & user)
  - Like, retweet
  - Search

Components:
  Services:
    - User Service
    - Tweet Service
    - Timeline Service
    - Search Service
    - Notification Service

  Databases:
    Users:
      - user_id, username, email, etc.

    Tweets:
      - tweet_id, user_id, content, created_at
      - Sharded by tweet_id

    Followers:
      - follower_id, followee_id
      - Sharded by follower_id

    Timelines (Redis):
      - user:{id}:timeline → [tweet_ids]
      - Cached, pre-computed

  Feed Generation:
    Fanout on Write (Push):
      - Write to all followers' timelines
      - Fast reads
      - Slow writes for popular users

    Fanout on Read (Pull):
      - Read from followed users
      - Fast writes
      - Slow reads

    Hybrid:
      - Push for normal users
      - Pull for celebrities

Scaling:
  - Read replicas for user data
  - Sharded tweet storage
  - Redis for timelines
  - Elasticsearch for search
  - Message queue for notifications

Design Chat System

Requirements:
  - 1-on-1 chat
  - Group chat
  - Online status
  - Message history
  - Push notifications

Components:
  WebSocket Server:
    - Persistent connections
    - Real-time messaging
    - Presence detection

  Message Service:
    - Store messages
    - Delivery status
    - Read receipts

  Presence Service:
    - Online/offline status
    - Last seen
    - Typing indicators

Databases:
  Messages:
    - message_id
    - conversation_id
    - sender_id
    - content
    - timestamp
    - Cassandra (time-series)

  Conversations:
    - conversation_id
    - participants
    - type (1-on-1, group)
    - PostgreSQL

  Connection Store:
    - user_id → server_id mapping
    - Redis for fast lookup

Flow:
  Send Message:
    1. Client → WebSocket Server
    2. Server validates & stores
    3. Server finds recipient's server
    4. Forward to recipient
    5. Push notification if offline

Scaling:
  - Multiple WebSocket servers
  - Service discovery for routing
  - Message queue for async processing
  - Database sharding by conversation_id

Additional Resources

System Design Primer
Designing Data-Intensive Applications by Martin Kleppmann
Web Architecture 101
High Scalability Blog

Last updated: 2025-11-16

System Design Patterns & Principles

Table of Contents

Core Concepts

Performance Metrics

Latency Numbers

System Design Principles

Scalability

Vertical Scaling (Scale Up)

Horizontal Scaling (Scale Out)

Load Distribution Patterns

Database Design

SQL vs NoSQL

Database Sharding

Replication Patterns

Caching Strategies

Cache Patterns

Cache Eviction Policies

Caching Levels

Load Balancing

Layer 4 vs Layer 7

Load Balancer Algorithms

Health Checks

API Design

REST API Best Practices

API Versioning

Rate Limiting

Microservices

Microservices Patterns

Saga Pattern (Distributed Transactions)

Circuit Breaker Pattern

Message Queues

Messaging Patterns

Popular Message Queues

Consistency & CAP Theorem

CAP Theorem

Consistency Models

Design Patterns

Common Architectural Patterns

Database Patterns

Common System Designs

URL Shortener

Design Twitter

Design Chat System

Additional Resources