Design Twitter
System Design Challenge
Design Twitter
What is Twitter?
Twitter is a real-time social media platform that allows users to post short messages (tweets), follow other users, and view timelines. It's similar to Facebook, Instagram, or LinkedIn. The service provides real-time messaging, social networking, and content discovery.
Real-time timeline generation with fan-out patterns and social graphs is what makes systems like Twitter unique. By understanding Twitter, you can tackle interview questions for similar social media platforms, since the core design challenges—timeline generation, fan-out patterns, real-time updates, and social graphs—remain the same.
Functional Requirements
Core (Interview Focussed)
- Tweet Posting: Users can post tweets with text and media.
- Timeline Generation: Generate personalized timelines for users.
- Social Features: Follow/unfollow users and see their tweets.
- Real-time Updates: Show new tweets in real-time.
Out of Scope
- User authentication and accounts
- Tweet search and discovery
- Direct messaging
- Tweet analytics and engagement
- Mobile app specific features
Non-Functional Requirements
Core (Interview Focussed)
- Low latency: Sub-second response time for timeline requests.
- High availability: 99.9% uptime for tweet posting and viewing.
- Scalability: Handle millions of tweets and users.
- Consistency: Ensure timeline consistency across users.
Out of Scope
- Data retention policies
- Compliance and privacy regulations
💡 Interview Tip: Focus on low latency, high availability, and scalability. Interviewers care most about timeline generation, fan-out patterns, and real-time updates.
Core Entities
Entity | Key Attributes | Notes |
---|---|---|
Tweet | tweet_id, user_id, content, created_at, media_urls | Indexed by user_id for user tweets |
User | user_id, username, email, follower_count, following_count | User account information |
Follow | follow_id, follower_id, following_id, created_at | Social graph relationships |
Timeline | timeline_id, user_id, tweet_ids, last_updated | Pre-computed timelines |
Like | like_id, tweet_id, user_id, created_at | Tweet engagement data |
💡 Interview Tip: Focus on Tweet, User, and Follow as they drive timeline generation, social graphs, and content distribution.
Core APIs
Tweet Management
POST /tweets { content, media_urls }
– Post a new tweetGET /tweets/{tweet_id}
– Get tweet detailsDELETE /tweets/{tweet_id}
– Delete a tweetGET /tweets?user_id=&limit=
– Get user's tweets
Timeline
GET /timeline
– Get user's home timelineGET /timeline/user/{user_id}
– Get user's profile timelineGET /timeline/mentions
– Get tweets mentioning userPOST /timeline/refresh
– Refresh timeline
Social Features
POST /users/{user_id}/follow
– Follow a userDELETE /users/{user_id}/follow
– Unfollow a userGET /users/{user_id}/followers
– Get user's followersGET /users/{user_id}/following
– Get users being followed
Real-time Updates
GET /stream/timeline
– WebSocket stream for timeline updatesGET /stream/mentions
– WebSocket stream for mentionsPOST /stream/subscribe { stream_type }
– Subscribe to streamPOST /stream/unsubscribe { stream_type }
– Unsubscribe from stream
High-Level Design
System Architecture Diagram
Key Components
- Tweet Service: Handle tweet CRUD operations
- Timeline Service: Generate and serve timelines
- Social Service: Manage follow relationships and social graphs
- Fan-out Service: Distribute tweets to followers
- Real-time Service: Handle WebSocket connections and real-time updates
- Database: Persistent storage for tweets, users, and relationships
Mapping Core Functional Requirements to Components
Functional Requirement | Responsible Components | Key Considerations |
---|---|---|
Tweet Posting | Tweet Service, Fan-out Service | Tweet storage, fan-out distribution |
Timeline Generation | Timeline Service, Fan-out Service | Timeline computation, caching |
Social Features | Social Service, Database | Follow relationships, social graphs |
Real-time Updates | Real-time Service, Fan-out Service | WebSocket connections, update broadcasting |
Detailed Design
Fan-out Service
Purpose: Distribute tweets to followers using fan-out patterns.
Key Design Decisions:
- Fan-out Strategy: Use push model for active users, pull model for inactive users
- Batch Processing: Process fan-out operations in batches
- Error Handling: Handle fan-out failures gracefully
- Performance Optimization: Optimize fan-out for high-volume users
Algorithm: Fan-out distribution
1. Receive new tweet from user
2. Get user's follower list
3. For each follower:
- Check follower's activity level
- If active user:
- Add tweet to follower's timeline
- Update timeline cache
- If inactive user:
- Skip immediate fan-out
- Use pull model on next access
4. Handle fan-out errors:
- Retry failed operations
- Log error details
- Continue processing
5. Update fan-out statistics
Timeline Service
Purpose: Generate and serve personalized timelines for users.
Key Design Decisions:
- Timeline Computation: Pre-compute timelines for active users
- Caching Strategy: Cache timelines for fast access
- Timeline Merging: Merge tweets from multiple sources
- Personalization: Personalize timelines based on user preferences
Algorithm: Timeline generation
1. Receive timeline request
2. Check cache for existing timeline
3. If not cached:
- Get user's following list
- Fetch recent tweets from followed users
- Merge tweets by timestamp
- Apply personalization:
- User's interests
- Tweet engagement
- Recency factor
- Cache generated timeline
4. Return timeline to user
5. Update timeline statistics
Real-time Service
Purpose: Handle WebSocket connections and broadcast real-time updates.
Key Design Decisions:
- WebSocket Connections: Maintain persistent connections for real-time updates
- Message Broadcasting: Broadcast updates to relevant users
- Connection Management: Handle connection drops and reconnections
- Update Filtering: Send relevant updates to each user
Algorithm: Real-time update broadcasting
1. User connects to timeline stream
2. Send current timeline to user
3. When new tweet arrives:
- Check if user follows tweet author
- If follows:
- Add tweet to user's timeline
- Broadcast update to user
- If mentions user:
- Send mention notification
- Update mentions timeline
4. Handle connection drops gracefully
5. Reconnect users with missed updates
Database Design
Tweets Table
Field | Type | Description |
---|---|---|
tweet_id | VARCHAR(36) | Primary key |
user_id | VARCHAR(36) | Tweet author |
content | TEXT | Tweet content |
media_urls | JSON | Media attachments |
created_at | TIMESTAMP | Creation timestamp |
Indexes:
idx_user_created
on (user_id, created_at) - User tweetsidx_created_at
on (created_at) - Recent tweets
Users Table
Field | Type | Description |
---|---|---|
user_id | VARCHAR(36) | Primary key |
username | VARCHAR(100) | Username |
VARCHAR(255) | Email address | |
follower_count | INT | Number of followers |
following_count | INT | Number following |
created_at | TIMESTAMP | Account creation |
Indexes:
idx_username
on (username) - Username lookupidx_follower_count
on (follower_count) - Popular users
Follows Table
Field | Type | Description |
---|---|---|
follow_id | VARCHAR(36) | Primary key |
follower_id | VARCHAR(36) | Follower user |
following_id | VARCHAR(36) | Following user |
created_at | TIMESTAMP | Follow timestamp |
Indexes:
idx_follower
on (follower_id) - User's followingidx_following
on (following_id) - User's followersunique_follow
on (follower_id, following_id) - Prevent duplicate follows
Timelines Table
Field | Type | Description |
---|---|---|
timeline_id | VARCHAR(36) | Primary key |
user_id | VARCHAR(36) | Timeline owner |
tweet_ids | JSON | Tweet IDs in timeline |
last_updated | TIMESTAMP | Last update |
Indexes:
idx_user_id
on (user_id) - User timelinesidx_last_updated
on (last_updated) - Recent timelines
Scalability Considerations
Horizontal Scaling
- Tweet Service: Scale horizontally with load balancers
- Timeline Service: Use consistent hashing for timeline partitioning
- Fan-out Service: Scale fan-out processing with distributed systems
- Database: Shard tweets and users by user_id
Caching Strategy
- Redis: Cache timelines and recent tweets
- CDN: Cache static content and media
- Application Cache: Cache frequently accessed data
Performance Optimization
- Connection Pooling: Efficient database connections
- Batch Processing: Batch fan-out operations for efficiency
- Async Processing: Non-blocking tweet processing
- Resource Monitoring: Monitor CPU, memory, and network usage
Monitoring and Observability
Key Metrics
- Tweet Latency: Average tweet posting time
- Timeline Latency: Average timeline generation time
- Fan-out Rate: Tweets distributed per second
- System Health: CPU, memory, and disk usage
Alerting
- High Latency: Alert when tweet or timeline time exceeds threshold
- Fan-out Failures: Alert when fan-out processing fails
- Connection Drops: Alert when WebSocket connections drop frequently
- System Errors: Alert on tweet processing failures
Trade-offs and Considerations
Consistency vs. Availability
- Choice: Eventual consistency for timelines, strong consistency for tweets
- Reasoning: Timelines can tolerate slight delays, tweets need immediate accuracy
Latency vs. Throughput
- Choice: Optimize for latency with timeline caching
- Reasoning: Real-time social media requires fast timeline updates
Storage vs. Performance
- Choice: Use timeline pre-computation for better performance
- Reasoning: Balance between storage costs and query performance
Common Interview Questions
Q: How would you handle high-volume users?
A: Use fan-out optimization, timeline pre-computation, and caching to handle high-volume users efficiently.
Q: How do you ensure timeline consistency?
A: Use fan-out patterns, timeline caching, and real-time updates to ensure timeline consistency.
Q: How would you scale this system globally?
A: Deploy regional tweet servers, use geo-distributed databases, and implement data replication strategies.
Q: How do you handle real-time updates at scale?
A: Use WebSocket connections, message broadcasting, and efficient fan-out patterns to handle real-time updates at scale.
Key Takeaways
- Fan-out Patterns: Essential for distributing tweets to followers efficiently
- Timeline Generation: Pre-computation and caching provide fast timeline access
- Real-time Updates: WebSocket connections and message broadcasting enable real-time social media
- Scalability: Horizontal scaling and partitioning are crucial for handling large-scale social media
- Monitoring: Comprehensive monitoring ensures system reliability and performance