Design YouTube

System Design Challenge

hard
45-60 minutes
cdnvideo-processingmicroservicescachestorage

Design YouTube

What is YouTube?

YouTube is a video sharing platform that allows users to upload, watch, and share videos. It's similar to Vimeo, TikTok, or Twitch. The service provides video hosting, streaming, and content discovery.

Video processing with CDN distribution and content delivery is what makes systems like YouTube unique. By understanding YouTube, you can tackle interview questions for similar video platforms, since the core design challenges—video processing, CDN distribution, content delivery, and scalability—remain the same.


Functional Requirements

Core (Interview Focussed)

  • Video Upload: Users can upload videos of various formats and sizes.
  • Video Streaming: Users can stream videos with different quality options.
  • Content Discovery: Users can discover videos through search and recommendations.
  • Video Processing: Process videos for different quality levels and formats.

Out of Scope

  • User authentication and accounts
  • Video monetization and ads
  • Live streaming
  • Video analytics and insights
  • Mobile app specific features

Non-Functional Requirements

Core (Interview Focussed)

  • High availability: 99.9% uptime for video streaming.
  • Scalability: Handle petabytes of video content.
  • Performance: Fast video loading and streaming.
  • Global distribution: Serve videos worldwide with low latency.

Out of Scope

  • Data retention policies
  • Compliance and privacy regulations

💡 Interview Tip: Focus on high availability, scalability, and performance. Interviewers care most about video processing, CDN distribution, and content delivery.


Core Entities

EntityKey AttributesNotes
Videovideo_id, title, description, duration, upload_dateIndexed by upload_date for recent videos
Useruser_id, username, email, subscriber_countUser account information
VideoFilefile_id, video_id, quality, format, file_url, sizeVideo file information
Categorycategory_id, name, description, video_countVideo categorization
Viewview_id, video_id, user_id, timestamp, durationVideo view tracking

💡 Interview Tip: Focus on Video, VideoFile, and View as they drive video processing, content delivery, and analytics.


Core APIs

Video Management

  • POST /videos { title, description, category, file } – Upload a new video
  • GET /videos/{video_id} – Get video details
  • PUT /videos/{video_id} { title, description } – Update video information
  • DELETE /videos/{video_id} – Delete a video

Video Streaming

  • GET /videos/{video_id}/stream?quality= – Stream video content
  • GET /videos/{video_id}/thumbnail – Get video thumbnail
  • GET /videos/{video_id}/formats – Get available video formats
  • POST /videos/{video_id}/view – Record video view

Content Discovery

  • GET /videos/search?query=&category=&limit= – Search for videos
  • GET /videos/trending?category=&limit= – Get trending videos
  • GET /videos/recommended?user_id=&limit= – Get recommended videos
  • GET /videos/category/{category_id}?limit= – Get videos by category

User Management

  • GET /users/{user_id}/videos – Get user's videos
  • GET /users/{user_id}/subscriptions – Get user's subscriptions
  • POST /users/{user_id}/subscribe – Subscribe to user
  • GET /users/{user_id}/recommendations – Get personalized recommendations

High-Level Design

System Architecture Diagram

Key Components

  • Video Service: Handle video CRUD operations
  • Video Processing Service: Process videos for different formats and qualities
  • CDN Service: Distribute video content globally
  • Content Discovery Service: Handle search and recommendations
  • Streaming Service: Manage video streaming and delivery
  • Database: Persistent storage for videos, users, and metadata

Mapping Core Functional Requirements to Components

Functional RequirementResponsible ComponentsKey Considerations
Video UploadVideo Service, Video Processing ServiceFile upload, video processing
Video StreamingStreaming Service, CDN ServiceContent delivery, quality adaptation
Content DiscoveryContent Discovery Service, DatabaseSearch algorithms, recommendation systems
Video ProcessingVideo Processing Service, StorageTranscoding, format conversion

Detailed Design

Video Processing Service

Purpose: Process uploaded videos for different formats and quality levels.

Key Design Decisions:

  • Transcoding: Convert videos to multiple formats and qualities
  • Thumbnail Generation: Generate video thumbnails
  • Metadata Extraction: Extract video metadata
  • Quality Optimization: Optimize video quality for different devices

Algorithm: Video processing

1. Receive uploaded video file
2. Validate video format and size
3. Extract video metadata:
   - Duration
   - Resolution
   - Frame rate
   - Bitrate
4. Generate video thumbnails:
   - Extract frames at intervals
   - Generate thumbnail images
   - Store thumbnails
5. Transcode video:
   - Convert to multiple formats (MP4, WebM)
   - Generate different quality levels
   - Optimize for different devices
6. Store processed videos
7. Update video status

CDN Service

Purpose: Distribute video content globally with low latency.

Key Design Decisions:

  • Global Distribution: Deploy CDN nodes worldwide
  • Content Caching: Cache popular videos at edge locations
  • Load Balancing: Distribute traffic across CDN nodes
  • Cache Management: Manage cache expiration and updates

Algorithm: CDN content distribution

1. Receive video streaming request
2. Determine user's geographic location
3. Find nearest CDN node
4. Check if video is cached at node
5. If cached:
   - Serve video from cache
   - Update cache statistics
6. If not cached:
   - Fetch video from origin server
   - Cache video at edge node
   - Serve video to user
7. Monitor CDN performance
8. Update cache policies

Content Discovery Service

Purpose: Handle video search and recommendation systems.

Key Design Decisions:

  • Search Algorithms: Use full-text search and content-based search
  • Recommendation Engine: Generate personalized video recommendations
  • Trending Algorithm: Identify trending videos
  • Content Filtering: Filter content based on user preferences

Algorithm: Video recommendation

1. Analyze user behavior:
   - Watch history
   - Search history
   - Like/dislike patterns
   - Subscription preferences
2. Find similar users:
   - Users with similar watch patterns
   - Users with similar preferences
3. Generate recommendations:
   - Videos liked by similar users
   - Videos in preferred categories
   - Trending videos
4. Rank recommendations:
   - User preference score
   - Video popularity
   - Recency factor
5. Return personalized recommendations

Streaming Service

Purpose: Manage video streaming and quality adaptation.

Key Design Decisions:

  • Adaptive Streaming: Adjust video quality based on network conditions
  • Buffering Management: Manage video buffering and preloading
  • Quality Selection: Select appropriate video quality
  • Stream Optimization: Optimize streaming performance

Algorithm: Adaptive streaming

1. Receive video streaming request
2. Detect user's network conditions:
   - Bandwidth
   - Latency
   - Device capabilities
3. Select appropriate video quality:
   - Start with medium quality
   - Adjust based on network conditions
   - Consider device capabilities
4. Stream video content:
   - Send video chunks
   - Monitor streaming performance
   - Adjust quality as needed
5. Handle streaming errors:
   - Retry failed requests
   - Fallback to lower quality
   - Notify user of issues

Database Design

Videos Table

FieldTypeDescription
video_idVARCHAR(36)Primary key
user_idVARCHAR(36)Video owner
titleVARCHAR(255)Video title
descriptionTEXTVideo description
categoryVARCHAR(100)Video category
durationINTVideo duration in seconds
upload_dateTIMESTAMPUpload timestamp
view_countBIGINTTotal views
like_countINTTotal likes

Indexes:

  • idx_user_id on (user_id) - User videos
  • idx_category on (category) - Category-based queries
  • idx_upload_date on (upload_date) - Recent videos
  • idx_view_count on (view_count) - Popular videos

Video Files Table

FieldTypeDescription
file_idVARCHAR(36)Primary key
video_idVARCHAR(36)Associated video
qualityVARCHAR(20)Video quality
formatVARCHAR(10)Video format
file_urlTEXTFile storage URL
file_sizeBIGINTFile size in bytes

Indexes:

  • idx_video_id on (video_id) - Video files
  • idx_quality on (quality) - Quality-based queries

Users Table

FieldTypeDescription
user_idVARCHAR(36)Primary key
usernameVARCHAR(100)Username
emailVARCHAR(255)Email address
subscriber_countINTNumber of subscribers
video_countINTNumber of videos
created_atTIMESTAMPAccount creation

Indexes:

  • idx_username on (username) - Username lookup
  • idx_subscriber_count on (subscriber_count) - Popular creators

Views Table

FieldTypeDescription
view_idVARCHAR(36)Primary key
video_idVARCHAR(36)Viewed video
user_idVARCHAR(36)Viewer (optional)
timestampTIMESTAMPView timestamp
durationINTView duration in seconds

Indexes:

  • idx_video_id on (video_id) - Video views
  • idx_user_id on (user_id) - User views
  • idx_timestamp on (timestamp) - View history

Scalability Considerations

Horizontal Scaling

  • Video Service: Scale horizontally with load balancers
  • Video Processing Service: Scale video processing with distributed systems
  • CDN Service: Scale CDN nodes globally
  • Database: Shard videos and users by geographic regions

Caching Strategy

  • CDN: Cache video content globally
  • Redis: Cache video metadata and recommendations
  • Application Cache: Cache frequently accessed data

Performance Optimization

  • Connection Pooling: Efficient database connections
  • Batch Processing: Batch video processing for efficiency
  • Async Processing: Non-blocking video processing
  • Resource Monitoring: Monitor CPU, memory, and network usage

Monitoring and Observability

Key Metrics

  • Video Upload Time: Average time to upload videos
  • Streaming Latency: Average video streaming latency
  • CDN Performance: CDN hit rate and response time
  • System Health: CPU, memory, and disk usage

Alerting

  • High Latency: Alert when streaming latency exceeds threshold
  • CDN Failures: Alert when CDN nodes fail
  • Processing Errors: Alert when video processing fails
  • System Errors: Alert on video service failures

Trade-offs and Considerations

Consistency vs. Availability

  • Choice: Eventual consistency for video metadata, strong consistency for streaming
  • Reasoning: Video metadata can tolerate slight delays, streaming needs immediate accuracy

Storage vs. Performance

  • Choice: Use CDN caching for better performance
  • Reasoning: Balance between storage costs and streaming performance

Quality vs. Bandwidth

  • Choice: Use adaptive streaming for optimal quality
  • Reasoning: Balance between video quality and bandwidth usage

Common Interview Questions

Q: How would you handle video processing at scale?

A: Use distributed video processing, multiple transcoding pipelines, and efficient storage to handle video processing at scale.

Q: How do you ensure global video delivery?

A: Use CDN distribution, edge caching, and geographic optimization to ensure global video delivery.

Q: How would you scale this system globally?

A: Deploy regional video servers, use geo-distributed databases, and implement data replication strategies.

Q: How do you handle video recommendation accuracy?

A: Use multiple recommendation algorithms, user feedback, and continuous learning to improve recommendation accuracy.


Key Takeaways

  1. Video Processing: Distributed transcoding and format conversion are essential for video platforms
  2. CDN Distribution: Global CDN deployment and edge caching enable fast video delivery
  3. Content Discovery: Search algorithms and recommendation systems improve user experience
  4. Scalability: Horizontal scaling and geographic partitioning are crucial for handling large-scale video content
  5. Monitoring: Comprehensive monitoring ensures system reliability and performance