Design Netflix
System Design Challenge
Design Netflix
What is Netflix?
Netflix is a video streaming platform that provides on-demand access to movies, TV shows, documentaries, and original content. It's similar to Amazon Prime Video, Disney+, or Hulu. The service provides video streaming, content recommendation, user profiles, and global content delivery.
Video streaming with content delivery networks (CDN) and recommendation engines is what makes systems like Netflix unique. By understanding Netflix, you can tackle interview questions for similar streaming platforms, since the core design challenges—video delivery, transcoding, recommendation algorithms, and global scalability—remain the same.
Functional Requirements
Core (Interview Focussed)
- Video Streaming: Users can stream videos with adaptive bitrate streaming.
- Content Discovery: Browse and search through available content.
- Recommendation Engine: Personalized content recommendations for users.
- User Profiles: Multiple profiles per account with viewing history.
Out of Scope
- User authentication and subscription management
- Content creation and production workflows
- Payment processing and billing
- Mobile app specific features
- Live streaming capabilities
Non-Functional Requirements
Core (Interview Focussed)
- Low latency: Video start time under 2 seconds.
- High availability: 99.9% uptime for streaming services.
- Scalability: Handle millions of concurrent streams globally.
- Quality: Adaptive streaming based on network conditions.
Out of Scope
- Data retention policies
- Compliance and privacy regulations
💡 Interview Tip: Focus on low latency, high availability, and scalability. Interviewers care most about CDN architecture, video transcoding, and recommendation algorithms.
Capacity Estimation
Scale
- Users: 200 million subscribers globally
- Concurrent streams: 50 million during peak hours
- Content library: 15,000+ titles
- Video storage: 100+ petabytes of content
Traffic
- Peak concurrent users: 50 million
- Average video bitrate: 5 Mbps
- Total bandwidth: 250 Tbps during peak
- Storage per video: 1-10 GB depending on quality
High-Level Design
System Architecture Diagram
The system consists of client applications, API gateway for request routing, microservices for different functionalities, databases for data persistence, and CDN for content delivery. The architecture supports horizontal scaling and global distribution.
Detailed Design
User Service
The User Service manages user profiles, authentication, and viewing history. It handles user registration, profile creation, and maintains viewing preferences. The service stores user data in a distributed database with read replicas for improved performance.
User profiles contain viewing history, preferences, ratings, and watchlist information. This data is crucial for the recommendation engine to provide personalized content suggestions. The service implements caching strategies to reduce database load and improve response times.
Content Service
The Content Service manages the content catalog, metadata, and availability. It handles content ingestion, metadata extraction, and content organization. The service maintains information about movies, TV shows, genres, ratings, and availability across different regions.
Content metadata includes titles, descriptions, cast information, genres, ratings, and regional availability. The service supports content search and filtering capabilities, enabling users to discover content based on various criteria. It implements full-text search using Elasticsearch for efficient content discovery.
Recommendation Service
The Recommendation Service provides personalized content recommendations using collaborative filtering and content-based filtering algorithms. It analyzes user viewing patterns, ratings, and preferences to suggest relevant content.
The recommendation engine processes user behavior data to identify patterns and similarities between users and content. It uses machine learning models to predict user preferences and generate personalized recommendations. The service implements real-time and batch processing for different recommendation scenarios.
Streaming Service
The Streaming Service handles video streaming requests and manages adaptive bitrate streaming. It coordinates with CDN to deliver video content efficiently and implements quality adaptation based on network conditions.
The service manages video streaming sessions, tracks playback progress, and handles quality switching. It implements adaptive bitrate streaming algorithms to optimize video quality based on available bandwidth. The service coordinates with CDN edge servers to minimize latency and improve streaming performance.
Content Delivery Network (CDN)
The CDN distributes video content globally using edge servers located close to users. It implements intelligent caching strategies and content replication to ensure fast content delivery worldwide.
CDN architecture includes origin servers, edge servers, and cache management systems. Edge servers cache popular content locally to reduce latency and bandwidth usage. The CDN implements geographic distribution and load balancing to optimize content delivery performance.
Video Transcoding Service
The Video Transcoding Service converts source videos into multiple formats and bitrates for adaptive streaming. It processes video content to create different quality versions suitable for various devices and network conditions.
Transcoding involves converting source videos into multiple resolutions (480p, 720p, 1080p, 4K) and bitrates. The service implements parallel processing and distributed transcoding to handle large volumes of content efficiently. It generates multiple streaming formats (HLS, DASH) for cross-platform compatibility.
Database Design
Database Choices
Primary Database: PostgreSQL
- Rationale: ACID compliance for user data, content metadata, and transactional operations
- Use Cases: User profiles, content metadata, subscription data, viewing history
- Benefits: Strong consistency, complex queries, JSON support for flexible schemas
Recommendation Database: MongoDB
- Rationale: Document-based storage for recommendation data and user behavior analytics
- Use Cases: User preferences, recommendation models, behavioral data
- Benefits: Flexible schema, horizontal scaling, aggregation pipelines
Search Database: Elasticsearch
- Rationale: Full-text search and analytics for content discovery
- Use Cases: Content search, user search, analytics queries
- Benefits: Fast text search, faceted search, real-time analytics
Cache: Redis
- Rationale: High-performance caching for frequently accessed data
- Use Cases: User sessions, recommendation cache, content metadata cache
- Benefits: Sub-millisecond latency, data structures, pub/sub capabilities
Table Design
Users Table (PostgreSQL)
Column | Type | Constraints | Default | Description |
---|---|---|---|---|
user_id | UUID | PRIMARY KEY | - | Unique identifier for account |
VARCHAR(255) | UNIQUE, NOT NULL | - | Primary contact and login credential | |
password_hash | VARCHAR(255) | NOT NULL | - | Encrypted password |
subscription_tier | VARCHAR(50) | NOT NULL | - | Subscription plan (Basic, Standard, Premium) |
created_at | TIMESTAMP | - | CURRENT_TIMESTAMP | Account creation time |
updated_at | TIMESTAMP | - | CURRENT_TIMESTAMP | Last update time |
last_login | TIMESTAMP | - | - | Last login timestamp |
is_active | BOOLEAN | - | TRUE | Account status |
Indexes:
idx_users_email
onemail
idx_users_subscription
onsubscription_tier
User Profiles Table (PostgreSQL)
Column | Type | Constraints | Default | Description |
---|---|---|---|---|
profile_id | UUID | PRIMARY KEY | - | Unique identifier for profile |
user_id | UUID | REFERENCES users(user_id) | - | Reference to user account |
profile_name | VARCHAR(100) | NOT NULL | - | Display name for profile |
avatar_url | VARCHAR(500) | - | - | Profile picture URL |
language_preference | VARCHAR(10) | - | 'en' | Preferred content language |
content_rating | VARCHAR(10) | - | 'PG-13' | Maximum content rating allowed |
created_at | TIMESTAMP | - | CURRENT_TIMESTAMP | Profile creation time |
updated_at | TIMESTAMP | - | CURRENT_TIMESTAMP | Last update time |
Indexes:
idx_profiles_user_id
onuser_id
idx_profiles_name
onprofile_name
Content Table (PostgreSQL)
Column | Type | Constraints | Default | Description |
---|---|---|---|---|
content_id | UUID | PRIMARY KEY | - | Unique identifier for content |
title | VARCHAR(500) | NOT NULL | - | Content title |
description | TEXT | - | - | Content description |
content_type | VARCHAR(50) | NOT NULL | - | Type (movie, tv_show, documentary) |
genre_ids | INTEGER | - | - | Array of genre IDs |
release_year | INTEGER | - | - | Year content was released |
duration_seconds | INTEGER | - | - | Total runtime in seconds |
rating | VARCHAR(10) | - | - | Content maturity rating |
imdb_id | VARCHAR(20) | - | - | IMDB identifier |
created_at | TIMESTAMP | - | CURRENT_TIMESTAMP | Content creation time |
updated_at | TIMESTAMP | - | CURRENT_TIMESTAMP | Last update time |
Indexes:
idx_content_type
oncontent_type
idx_content_genre
ongenre_ids
(GIN index)idx_content_year
onrelease_year
idx_content_title
ontitle
(GIN tsvector index)
Viewing History Table (PostgreSQL)
Column | Type | Constraints | Default | Description |
---|---|---|---|---|
history_id | UUID | PRIMARY KEY | - | Unique identifier for viewing record |
profile_id | UUID | REFERENCES user_profiles(profile_id) | - | Reference to user profile |
content_id | UUID | REFERENCES content(content_id) | - | Reference to content |
watch_time_seconds | INTEGER | NOT NULL | - | Time spent watching in seconds |
total_duration_seconds | INTEGER | NOT NULL | - | Total content duration |
completion_percentage | DECIMAL(5,2) | - | - | Percentage of content watched |
watched_at | TIMESTAMP | - | CURRENT_TIMESTAMP | When viewing occurred |
device_type | VARCHAR(50) | - | - | Platform used for viewing |
quality | VARCHAR(20) | - | - | Video quality watched |
Indexes:
idx_history_profile
onprofile_id
idx_history_content
oncontent_id
idx_history_watched_at
onwatched_at
Recommendations Table (MongoDB)
Field | Type | Description |
---|---|---|
_id | ObjectId | MongoDB document identifier |
profile_id | String | Reference to user profile |
recommendations | Array | List of recommended content |
recommendations.content_id | String | Reference to recommended content |
recommendations.score | Number | Confidence score (0-1) |
recommendations.algorithm | String | Algorithm used (collaborative_filtering, content_based) |
recommendations.reason | String | Explanation for recommendation |
last_updated | Date | When recommendations were last updated |
model_version | String | Version of recommendation model |
Example Document:
{
"_id": ObjectId("..."),
"profile_id": "profile_123",
"recommendations": [
{
"content_id": "content_456",
"score": 0.95,
"algorithm": "collaborative_filtering",
"reason": "similar_users_liked"
}
],
"last_updated": ISODate("2024-01-15T12:00:00Z"),
"model_version": "v2.1"
}
Content Search Index (Elasticsearch)
Field | Type | Analyzer | Description |
---|---|---|---|
content_id | keyword | - | Unique content identifier |
title | text | english | Content title (searchable) |
title.keyword | keyword | - | Content title (exact match) |
description | text | english | Content description (searchable) |
genres | keyword | - | Array of genre names |
cast | keyword | - | Array of cast member names |
release_year | integer | - | Year content was released |
rating | float | - | Content rating score |
content_type | keyword | - | Type (movie, tv_show, documentary) |
Index Configuration:
{
"mappings": {
"properties": {
"content_id": {"type": "keyword"},
"title": {
"type": "text",
"analyzer": "english",
"fields": {
"keyword": {"type": "keyword"}
}
},
"description": {"type": "text", "analyzer": "english"},
"genres": {"type": "keyword"},
"cast": {"type": "keyword"},
"release_year": {"type": "integer"},
"rating": {"type": "float"},
"content_type": {"type": "keyword"}
}
}
}
Core Entities
User Entity
The User entity represents a Netflix account holder with subscription information and authentication details. Each user can have multiple profiles for different family members or viewing preferences.
Key Attributes:
- User ID: Unique identifier for the account
- Email: Primary contact and login credential
- Subscription Tier: Basic, Standard, or Premium plan
- Account Status: Active, suspended, or cancelled
- Creation Date: When the account was created
Relationships:
- One-to-Many with User Profiles
- One-to-Many with Viewing History (through profiles)
- One-to-Many with Recommendations (through profiles)
User Profile Entity
The User Profile entity represents individual viewing profiles within a user account. Each profile maintains its own viewing history, preferences, and recommendations.
Key Attributes:
- Profile ID: Unique identifier for the profile
- Profile Name: Display name for the profile
- Language Preference: Preferred content language
- Content Rating: Maximum content rating allowed
- Avatar: Profile picture URL
Relationships:
- Many-to-One with User
- One-to-Many with Viewing History
- One-to-Many with Recommendations
Content Entity
The Content entity represents movies, TV shows, documentaries, and other video content available on the platform. It contains metadata for content discovery and recommendation.
Key Attributes:
- Content ID: Unique identifier for the content
- Title: Content title
- Description: Detailed content description
- Content Type: Movie, TV show, documentary, etc.
- Genres: Array of genre classifications
- Release Year: Year the content was released
- Duration: Total runtime in seconds
- Rating: Content maturity rating
Relationships:
- One-to-Many with Viewing History
- One-to-Many with Recommendations
- Many-to-Many with Genres
Viewing History Entity
The Viewing History entity tracks user viewing behavior and progress through content. This data is crucial for recommendation algorithms and user experience.
Key Attributes:
- History ID: Unique identifier for the viewing record
- Watch Time: Time spent watching in seconds
- Total Duration: Complete content duration
- Completion Percentage: How much of the content was watched
- Watch Date: When the viewing occurred
- Device Type: Platform used for viewing
- Quality: Video quality watched
Relationships:
- Many-to-One with User Profile
- Many-to-One with Content
Recommendation Entity
The Recommendation entity stores personalized content suggestions generated by the recommendation engine. Recommendations are updated regularly based on user behavior.
Key Attributes:
- Recommendation ID: Unique identifier for the recommendation
- Score: Confidence score for the recommendation
- Algorithm: Method used to generate the recommendation
- Reason: Explanation for why content was recommended
- Model Version: Version of the recommendation model used
Relationships:
- Many-to-One with User Profile
- Many-to-One with Content
Data Models
User Profile
{
"userId": "user_123",
"profileId": "profile_456",
"profileName": "John",
"viewingHistory": [
{
"contentId": "movie_789",
"watchTime": 3600,
"timestamp": "2024-01-15T10:30:00Z"
}
],
"preferences": {
"genres": ["action", "comedy"],
"languages": ["en", "es"]
},
"ratings": {
"movie_789": 5,
"show_101": 4
}
}
Content Metadata
{
"contentId": "movie_789",
"title": "The Matrix",
"description": "A computer hacker learns about the true nature of reality...",
"genres": ["action", "sci-fi"],
"cast": ["Keanu Reeves", "Laurence Fishburne"],
"duration": 7200,
"releaseYear": 1999,
"rating": "R",
"availableRegions": ["US", "CA", "UK"],
"videoFormats": {
"480p": "video_480p_url",
"720p": "video_720p_url",
"1080p": "video_1080p_url"
}
}
Recommendation Data
{
"userId": "user_123",
"recommendations": [
{
"contentId": "movie_456",
"score": 0.95,
"reason": "similar_users_liked"
},
{
"contentId": "show_789",
"score": 0.87,
"reason": "genre_preference"
}
],
"lastUpdated": "2024-01-15T12:00:00Z"
}
API Design
Stream Video
GET /api/v1/stream/{contentId}
Authorization: Bearer {token}
X-Profile-Id: {profileId}
Response:
{
"streamUrl": "https://cdn.netflix.com/video/{contentId}/playlist.m3u8",
"quality": "720p",
"bitrate": 5000000,
"duration": 7200
}
Get Recommendations
GET /api/v1/recommendations
Authorization: Bearer {token}
X-Profile-Id: {profileId}
Response:
{
"recommendations": [
{
"contentId": "movie_456",
"title": "Inception",
"score": 0.95,
"reason": "similar_users_liked"
}
]
}
Search Content
GET /api/v1/search?q=matrix&genre=action&limit=20
Authorization: Bearer {token}
Response:
{
"results": [
{
"contentId": "movie_789",
"title": "The Matrix",
"type": "movie",
"genres": ["action", "sci-fi"],
"rating": 8.7
}
],
"total": 1
}
Scalability Considerations
Horizontal Scaling
The system implements horizontal scaling across all components. Microservices can be scaled independently based on demand. Load balancers distribute traffic across multiple service instances to handle increased load.
Database Scaling
Databases use read replicas and sharding strategies to handle large data volumes. User data is sharded by user ID, while content data uses content ID for distribution. Read replicas reduce load on primary databases and improve query performance.
CDN Optimization
CDN implements intelligent caching and content replication strategies. Popular content is cached at edge servers globally, while less popular content is served from regional data centers. Cache invalidation strategies ensure content freshness.
Video Delivery Optimization
Video content is pre-processed and stored in multiple formats and bitrates. CDN edge servers cache video segments locally to minimize latency. Adaptive bitrate streaming algorithms optimize quality based on network conditions.
Security Considerations
Content Protection
Video content is encrypted using DRM (Digital Rights Management) technologies. Content keys are managed securely and distributed only to authorized clients. Watermarking techniques help prevent unauthorized distribution.
User Authentication
User authentication uses OAuth 2.0 with JWT tokens for session management. Multi-factor authentication provides additional security for user accounts. API endpoints implement rate limiting and authentication checks.
Data Privacy
User viewing data is anonymized and aggregated for analytics purposes. Personal information is encrypted and stored securely. Privacy controls allow users to manage their data sharing preferences.
Monitoring and Analytics
Performance Monitoring
The system monitors key metrics including video start time, buffering events, and error rates. CDN performance metrics track cache hit rates and bandwidth usage. Service health checks ensure system availability.
User Analytics
User behavior analytics track viewing patterns, content preferences, and engagement metrics. A/B testing frameworks enable experimentation with recommendation algorithms and user interface changes.
Content Analytics
Content performance metrics track view counts, completion rates, and user ratings. Content recommendation effectiveness is measured through click-through rates and user engagement metrics.
Trade-offs and Considerations
Consistency vs Availability
The system prioritizes availability over strong consistency for video streaming. User viewing history may have eventual consistency, but streaming services maintain high availability. Content metadata uses eventual consistency for better performance.
Cost vs Performance
CDN costs increase with global distribution and high bandwidth usage. The system balances performance requirements with cost constraints through intelligent caching and content optimization strategies.
Quality vs Bandwidth
Adaptive bitrate streaming balances video quality with bandwidth usage. Higher quality videos require more bandwidth but provide better user experience. The system optimizes this trade-off based on network conditions and user preferences.
Future Enhancements
Advanced Recommendations
Machine learning models can be enhanced with deep learning techniques for better recommendation accuracy. Real-time recommendation updates based on current viewing behavior can improve personalization.
Interactive Content
Support for interactive content like choose-your-own-adventure shows can enhance user engagement. Interactive features require additional infrastructure for user input processing and content branching.
Global Expansion
Expanding to new regions requires content localization and compliance with local regulations. Multi-language support and regional content libraries can improve global user experience.
Interview Tips
Key Points to Cover
- CDN Architecture: Explain how content delivery networks work and their role in video streaming
- Video Transcoding: Discuss the importance of multiple formats and adaptive bitrate streaming
- Recommendation Engine: Explain collaborative filtering and content-based filtering algorithms
- Scalability: Discuss horizontal scaling strategies and global distribution challenges
Common Follow-up Questions
- How would you handle video buffering and quality adaptation?
- What strategies would you use for content recommendation?
- How would you optimize CDN performance globally?
- What approaches would you take for video content protection?
Red Flags to Avoid
- Not considering CDN architecture for global content delivery
- Ignoring video transcoding and format requirements
- Overlooking recommendation algorithm complexity
- Not addressing scalability challenges for millions of users