Design Netflix

System Design Challenge

hard
45-60 minutes
cdnvideo-transcodingrecommendation-enginecontent-deliveryuser-profiles

Design Netflix

What is Netflix?

Netflix is a video streaming platform that provides on-demand access to movies, TV shows, documentaries, and original content. It's similar to Amazon Prime Video, Disney+, or Hulu. The service provides video streaming, content recommendation, user profiles, and global content delivery.

Video streaming with content delivery networks (CDN) and recommendation engines is what makes systems like Netflix unique. By understanding Netflix, you can tackle interview questions for similar streaming platforms, since the core design challenges—video delivery, transcoding, recommendation algorithms, and global scalability—remain the same.


Functional Requirements

Core (Interview Focussed)

  • Video Streaming: Users can stream videos with adaptive bitrate streaming.
  • Content Discovery: Browse and search through available content.
  • Recommendation Engine: Personalized content recommendations for users.
  • User Profiles: Multiple profiles per account with viewing history.

Out of Scope

  • User authentication and subscription management
  • Content creation and production workflows
  • Payment processing and billing
  • Mobile app specific features
  • Live streaming capabilities

Non-Functional Requirements

Core (Interview Focussed)

  • Low latency: Video start time under 2 seconds.
  • High availability: 99.9% uptime for streaming services.
  • Scalability: Handle millions of concurrent streams globally.
  • Quality: Adaptive streaming based on network conditions.

Out of Scope

  • Data retention policies
  • Compliance and privacy regulations

💡 Interview Tip: Focus on low latency, high availability, and scalability. Interviewers care most about CDN architecture, video transcoding, and recommendation algorithms.


Capacity Estimation

Scale

  • Users: 200 million subscribers globally
  • Concurrent streams: 50 million during peak hours
  • Content library: 15,000+ titles
  • Video storage: 100+ petabytes of content

Traffic

  • Peak concurrent users: 50 million
  • Average video bitrate: 5 Mbps
  • Total bandwidth: 250 Tbps during peak
  • Storage per video: 1-10 GB depending on quality

High-Level Design

System Architecture Diagram

The system consists of client applications, API gateway for request routing, microservices for different functionalities, databases for data persistence, and CDN for content delivery. The architecture supports horizontal scaling and global distribution.


Detailed Design

User Service

The User Service manages user profiles, authentication, and viewing history. It handles user registration, profile creation, and maintains viewing preferences. The service stores user data in a distributed database with read replicas for improved performance.

User profiles contain viewing history, preferences, ratings, and watchlist information. This data is crucial for the recommendation engine to provide personalized content suggestions. The service implements caching strategies to reduce database load and improve response times.

Content Service

The Content Service manages the content catalog, metadata, and availability. It handles content ingestion, metadata extraction, and content organization. The service maintains information about movies, TV shows, genres, ratings, and availability across different regions.

Content metadata includes titles, descriptions, cast information, genres, ratings, and regional availability. The service supports content search and filtering capabilities, enabling users to discover content based on various criteria. It implements full-text search using Elasticsearch for efficient content discovery.

Recommendation Service

The Recommendation Service provides personalized content recommendations using collaborative filtering and content-based filtering algorithms. It analyzes user viewing patterns, ratings, and preferences to suggest relevant content.

The recommendation engine processes user behavior data to identify patterns and similarities between users and content. It uses machine learning models to predict user preferences and generate personalized recommendations. The service implements real-time and batch processing for different recommendation scenarios.

Streaming Service

The Streaming Service handles video streaming requests and manages adaptive bitrate streaming. It coordinates with CDN to deliver video content efficiently and implements quality adaptation based on network conditions.

The service manages video streaming sessions, tracks playback progress, and handles quality switching. It implements adaptive bitrate streaming algorithms to optimize video quality based on available bandwidth. The service coordinates with CDN edge servers to minimize latency and improve streaming performance.

Content Delivery Network (CDN)

The CDN distributes video content globally using edge servers located close to users. It implements intelligent caching strategies and content replication to ensure fast content delivery worldwide.

CDN architecture includes origin servers, edge servers, and cache management systems. Edge servers cache popular content locally to reduce latency and bandwidth usage. The CDN implements geographic distribution and load balancing to optimize content delivery performance.

Video Transcoding Service

The Video Transcoding Service converts source videos into multiple formats and bitrates for adaptive streaming. It processes video content to create different quality versions suitable for various devices and network conditions.

Transcoding involves converting source videos into multiple resolutions (480p, 720p, 1080p, 4K) and bitrates. The service implements parallel processing and distributed transcoding to handle large volumes of content efficiently. It generates multiple streaming formats (HLS, DASH) for cross-platform compatibility.


Database Design

Database Choices

Primary Database: PostgreSQL

  • Rationale: ACID compliance for user data, content metadata, and transactional operations
  • Use Cases: User profiles, content metadata, subscription data, viewing history
  • Benefits: Strong consistency, complex queries, JSON support for flexible schemas

Recommendation Database: MongoDB

  • Rationale: Document-based storage for recommendation data and user behavior analytics
  • Use Cases: User preferences, recommendation models, behavioral data
  • Benefits: Flexible schema, horizontal scaling, aggregation pipelines

Search Database: Elasticsearch

  • Rationale: Full-text search and analytics for content discovery
  • Use Cases: Content search, user search, analytics queries
  • Benefits: Fast text search, faceted search, real-time analytics

Cache: Redis

  • Rationale: High-performance caching for frequently accessed data
  • Use Cases: User sessions, recommendation cache, content metadata cache
  • Benefits: Sub-millisecond latency, data structures, pub/sub capabilities

Table Design

Users Table (PostgreSQL)

ColumnTypeConstraintsDefaultDescription
user_idUUIDPRIMARY KEY-Unique identifier for account
emailVARCHAR(255)UNIQUE, NOT NULL-Primary contact and login credential
password_hashVARCHAR(255)NOT NULL-Encrypted password
subscription_tierVARCHAR(50)NOT NULL-Subscription plan (Basic, Standard, Premium)
created_atTIMESTAMP-CURRENT_TIMESTAMPAccount creation time
updated_atTIMESTAMP-CURRENT_TIMESTAMPLast update time
last_loginTIMESTAMP--Last login timestamp
is_activeBOOLEAN-TRUEAccount status

Indexes:

  • idx_users_email on email
  • idx_users_subscription on subscription_tier

User Profiles Table (PostgreSQL)

ColumnTypeConstraintsDefaultDescription
profile_idUUIDPRIMARY KEY-Unique identifier for profile
user_idUUIDREFERENCES users(user_id)-Reference to user account
profile_nameVARCHAR(100)NOT NULL-Display name for profile
avatar_urlVARCHAR(500)--Profile picture URL
language_preferenceVARCHAR(10)-'en'Preferred content language
content_ratingVARCHAR(10)-'PG-13'Maximum content rating allowed
created_atTIMESTAMP-CURRENT_TIMESTAMPProfile creation time
updated_atTIMESTAMP-CURRENT_TIMESTAMPLast update time

Indexes:

  • idx_profiles_user_id on user_id
  • idx_profiles_name on profile_name

Content Table (PostgreSQL)

ColumnTypeConstraintsDefaultDescription
content_idUUIDPRIMARY KEY-Unique identifier for content
titleVARCHAR(500)NOT NULL-Content title
descriptionTEXT--Content description
content_typeVARCHAR(50)NOT NULL-Type (movie, tv_show, documentary)
genre_idsINTEGER--Array of genre IDs
release_yearINTEGER--Year content was released
duration_secondsINTEGER--Total runtime in seconds
ratingVARCHAR(10)--Content maturity rating
imdb_idVARCHAR(20)--IMDB identifier
created_atTIMESTAMP-CURRENT_TIMESTAMPContent creation time
updated_atTIMESTAMP-CURRENT_TIMESTAMPLast update time

Indexes:

  • idx_content_type on content_type
  • idx_content_genre on genre_ids (GIN index)
  • idx_content_year on release_year
  • idx_content_title on title (GIN tsvector index)

Viewing History Table (PostgreSQL)

ColumnTypeConstraintsDefaultDescription
history_idUUIDPRIMARY KEY-Unique identifier for viewing record
profile_idUUIDREFERENCES user_profiles(profile_id)-Reference to user profile
content_idUUIDREFERENCES content(content_id)-Reference to content
watch_time_secondsINTEGERNOT NULL-Time spent watching in seconds
total_duration_secondsINTEGERNOT NULL-Total content duration
completion_percentageDECIMAL(5,2)--Percentage of content watched
watched_atTIMESTAMP-CURRENT_TIMESTAMPWhen viewing occurred
device_typeVARCHAR(50)--Platform used for viewing
qualityVARCHAR(20)--Video quality watched

Indexes:

  • idx_history_profile on profile_id
  • idx_history_content on content_id
  • idx_history_watched_at on watched_at

Recommendations Table (MongoDB)

FieldTypeDescription
_idObjectIdMongoDB document identifier
profile_idStringReference to user profile
recommendationsArrayList of recommended content
recommendations.content_idStringReference to recommended content
recommendations.scoreNumberConfidence score (0-1)
recommendations.algorithmStringAlgorithm used (collaborative_filtering, content_based)
recommendations.reasonStringExplanation for recommendation
last_updatedDateWhen recommendations were last updated
model_versionStringVersion of recommendation model

Example Document:

{
  "_id": ObjectId("..."),
  "profile_id": "profile_123",
  "recommendations": [
    {
      "content_id": "content_456",
      "score": 0.95,
      "algorithm": "collaborative_filtering",
      "reason": "similar_users_liked"
    }
  ],
  "last_updated": ISODate("2024-01-15T12:00:00Z"),
  "model_version": "v2.1"
}

Content Search Index (Elasticsearch)

FieldTypeAnalyzerDescription
content_idkeyword-Unique content identifier
titletextenglishContent title (searchable)
title.keywordkeyword-Content title (exact match)
descriptiontextenglishContent description (searchable)
genreskeyword-Array of genre names
castkeyword-Array of cast member names
release_yearinteger-Year content was released
ratingfloat-Content rating score
content_typekeyword-Type (movie, tv_show, documentary)

Index Configuration:

{
  "mappings": {
    "properties": {
      "content_id": {"type": "keyword"},
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {
          "keyword": {"type": "keyword"}
        }
      },
      "description": {"type": "text", "analyzer": "english"},
      "genres": {"type": "keyword"},
      "cast": {"type": "keyword"},
      "release_year": {"type": "integer"},
      "rating": {"type": "float"},
      "content_type": {"type": "keyword"}
    }
  }
}

Core Entities

User Entity

The User entity represents a Netflix account holder with subscription information and authentication details. Each user can have multiple profiles for different family members or viewing preferences.

Key Attributes:

  • User ID: Unique identifier for the account
  • Email: Primary contact and login credential
  • Subscription Tier: Basic, Standard, or Premium plan
  • Account Status: Active, suspended, or cancelled
  • Creation Date: When the account was created

Relationships:

  • One-to-Many with User Profiles
  • One-to-Many with Viewing History (through profiles)
  • One-to-Many with Recommendations (through profiles)

User Profile Entity

The User Profile entity represents individual viewing profiles within a user account. Each profile maintains its own viewing history, preferences, and recommendations.

Key Attributes:

  • Profile ID: Unique identifier for the profile
  • Profile Name: Display name for the profile
  • Language Preference: Preferred content language
  • Content Rating: Maximum content rating allowed
  • Avatar: Profile picture URL

Relationships:

  • Many-to-One with User
  • One-to-Many with Viewing History
  • One-to-Many with Recommendations

Content Entity

The Content entity represents movies, TV shows, documentaries, and other video content available on the platform. It contains metadata for content discovery and recommendation.

Key Attributes:

  • Content ID: Unique identifier for the content
  • Title: Content title
  • Description: Detailed content description
  • Content Type: Movie, TV show, documentary, etc.
  • Genres: Array of genre classifications
  • Release Year: Year the content was released
  • Duration: Total runtime in seconds
  • Rating: Content maturity rating

Relationships:

  • One-to-Many with Viewing History
  • One-to-Many with Recommendations
  • Many-to-Many with Genres

Viewing History Entity

The Viewing History entity tracks user viewing behavior and progress through content. This data is crucial for recommendation algorithms and user experience.

Key Attributes:

  • History ID: Unique identifier for the viewing record
  • Watch Time: Time spent watching in seconds
  • Total Duration: Complete content duration
  • Completion Percentage: How much of the content was watched
  • Watch Date: When the viewing occurred
  • Device Type: Platform used for viewing
  • Quality: Video quality watched

Relationships:

  • Many-to-One with User Profile
  • Many-to-One with Content

Recommendation Entity

The Recommendation entity stores personalized content suggestions generated by the recommendation engine. Recommendations are updated regularly based on user behavior.

Key Attributes:

  • Recommendation ID: Unique identifier for the recommendation
  • Score: Confidence score for the recommendation
  • Algorithm: Method used to generate the recommendation
  • Reason: Explanation for why content was recommended
  • Model Version: Version of the recommendation model used

Relationships:

  • Many-to-One with User Profile
  • Many-to-One with Content

Data Models

User Profile

{
  "userId": "user_123",
  "profileId": "profile_456",
  "profileName": "John",
  "viewingHistory": [
    {
      "contentId": "movie_789",
      "watchTime": 3600,
      "timestamp": "2024-01-15T10:30:00Z"
    }
  ],
  "preferences": {
    "genres": ["action", "comedy"],
    "languages": ["en", "es"]
  },
  "ratings": {
    "movie_789": 5,
    "show_101": 4
  }
}

Content Metadata

{
  "contentId": "movie_789",
  "title": "The Matrix",
  "description": "A computer hacker learns about the true nature of reality...",
  "genres": ["action", "sci-fi"],
  "cast": ["Keanu Reeves", "Laurence Fishburne"],
  "duration": 7200,
  "releaseYear": 1999,
  "rating": "R",
  "availableRegions": ["US", "CA", "UK"],
  "videoFormats": {
    "480p": "video_480p_url",
    "720p": "video_720p_url",
    "1080p": "video_1080p_url"
  }
}

Recommendation Data

{
  "userId": "user_123",
  "recommendations": [
    {
      "contentId": "movie_456",
      "score": 0.95,
      "reason": "similar_users_liked"
    },
    {
      "contentId": "show_789",
      "score": 0.87,
      "reason": "genre_preference"
    }
  ],
  "lastUpdated": "2024-01-15T12:00:00Z"
}

API Design

Stream Video

GET /api/v1/stream/{contentId}
Authorization: Bearer {token}
X-Profile-Id: {profileId}

Response:
{
  "streamUrl": "https://cdn.netflix.com/video/{contentId}/playlist.m3u8",
  "quality": "720p",
  "bitrate": 5000000,
  "duration": 7200
}

Get Recommendations

GET /api/v1/recommendations
Authorization: Bearer {token}
X-Profile-Id: {profileId}

Response:
{
  "recommendations": [
    {
      "contentId": "movie_456",
      "title": "Inception",
      "score": 0.95,
      "reason": "similar_users_liked"
    }
  ]
}

Search Content

GET /api/v1/search?q=matrix&genre=action&limit=20
Authorization: Bearer {token}

Response:
{
  "results": [
    {
      "contentId": "movie_789",
      "title": "The Matrix",
      "type": "movie",
      "genres": ["action", "sci-fi"],
      "rating": 8.7
    }
  ],
  "total": 1
}

Scalability Considerations

Horizontal Scaling

The system implements horizontal scaling across all components. Microservices can be scaled independently based on demand. Load balancers distribute traffic across multiple service instances to handle increased load.

Database Scaling

Databases use read replicas and sharding strategies to handle large data volumes. User data is sharded by user ID, while content data uses content ID for distribution. Read replicas reduce load on primary databases and improve query performance.

CDN Optimization

CDN implements intelligent caching and content replication strategies. Popular content is cached at edge servers globally, while less popular content is served from regional data centers. Cache invalidation strategies ensure content freshness.

Video Delivery Optimization

Video content is pre-processed and stored in multiple formats and bitrates. CDN edge servers cache video segments locally to minimize latency. Adaptive bitrate streaming algorithms optimize quality based on network conditions.


Security Considerations

Content Protection

Video content is encrypted using DRM (Digital Rights Management) technologies. Content keys are managed securely and distributed only to authorized clients. Watermarking techniques help prevent unauthorized distribution.

User Authentication

User authentication uses OAuth 2.0 with JWT tokens for session management. Multi-factor authentication provides additional security for user accounts. API endpoints implement rate limiting and authentication checks.

Data Privacy

User viewing data is anonymized and aggregated for analytics purposes. Personal information is encrypted and stored securely. Privacy controls allow users to manage their data sharing preferences.


Monitoring and Analytics

Performance Monitoring

The system monitors key metrics including video start time, buffering events, and error rates. CDN performance metrics track cache hit rates and bandwidth usage. Service health checks ensure system availability.

User Analytics

User behavior analytics track viewing patterns, content preferences, and engagement metrics. A/B testing frameworks enable experimentation with recommendation algorithms and user interface changes.

Content Analytics

Content performance metrics track view counts, completion rates, and user ratings. Content recommendation effectiveness is measured through click-through rates and user engagement metrics.


Trade-offs and Considerations

Consistency vs Availability

The system prioritizes availability over strong consistency for video streaming. User viewing history may have eventual consistency, but streaming services maintain high availability. Content metadata uses eventual consistency for better performance.

Cost vs Performance

CDN costs increase with global distribution and high bandwidth usage. The system balances performance requirements with cost constraints through intelligent caching and content optimization strategies.

Quality vs Bandwidth

Adaptive bitrate streaming balances video quality with bandwidth usage. Higher quality videos require more bandwidth but provide better user experience. The system optimizes this trade-off based on network conditions and user preferences.


Future Enhancements

Advanced Recommendations

Machine learning models can be enhanced with deep learning techniques for better recommendation accuracy. Real-time recommendation updates based on current viewing behavior can improve personalization.

Interactive Content

Support for interactive content like choose-your-own-adventure shows can enhance user engagement. Interactive features require additional infrastructure for user input processing and content branching.

Global Expansion

Expanding to new regions requires content localization and compliance with local regulations. Multi-language support and regional content libraries can improve global user experience.


Interview Tips

Key Points to Cover

  1. CDN Architecture: Explain how content delivery networks work and their role in video streaming
  2. Video Transcoding: Discuss the importance of multiple formats and adaptive bitrate streaming
  3. Recommendation Engine: Explain collaborative filtering and content-based filtering algorithms
  4. Scalability: Discuss horizontal scaling strategies and global distribution challenges

Common Follow-up Questions

  • How would you handle video buffering and quality adaptation?
  • What strategies would you use for content recommendation?
  • How would you optimize CDN performance globally?
  • What approaches would you take for video content protection?

Red Flags to Avoid

  • Not considering CDN architecture for global content delivery
  • Ignoring video transcoding and format requirements
  • Overlooking recommendation algorithm complexity
  • Not addressing scalability challenges for millions of users