Design Strava
System Design Challenge
Design Strava
What is Strava?
Strava is a fitness tracking and social networking platform that allows users to track their workouts, share activities, and compete on leaderboards. It's similar to Garmin Connect, Nike Run Club, or MapMyRun. The service provides GPS tracking, social features, and performance analytics.
GPS data processing with social features and performance analytics is what makes systems like Strava unique. By understanding Strava, you can tackle interview questions for similar fitness platforms, since the core design challenges—GPS processing, social features, leaderboards, and performance analytics—remain the same.
Functional Requirements
Core (Interview Focussed)
- Activity Tracking: Track workouts with GPS data and metrics.
- Social Features: Share activities and follow other athletes.
- Leaderboards: Compete on segments and routes.
- Performance Analytics: Analyze workout data and progress.
Out of Scope
- User authentication and accounts
- Premium features and subscriptions
- Device integration and sync
- Training plans and coaching
- Mobile app specific features
Non-Functional Requirements
Core (Interview Focussed)
- High availability: 99.9% uptime for activity tracking.
- Scalability: Handle millions of athletes and activities.
- Performance: Fast activity upload and leaderboard updates.
- Data accuracy: Accurate GPS data processing and metrics.
Out of Scope
- Data retention policies
- Compliance and privacy regulations
💡 Interview Tip: Focus on high availability, scalability, and performance. Interviewers care most about GPS processing, social features, and leaderboard algorithms.
Core Entities
Entity | Key Attributes | Notes |
---|---|---|
Activity | activity_id, user_id, activity_type, start_time, duration | Indexed by user_id for user activities |
GPSPoint | point_id, activity_id, latitude, longitude, timestamp | GPS track points |
Segment | segment_id, name, start_point, end_point, distance | Competition segments |
Leaderboard | leaderboard_id, segment_id, activity_id, time, rank | Segment leaderboards |
User | user_id, username, email, follower_count | User account information |
💡 Interview Tip: Focus on Activity, GPSPoint, and Segment as they drive activity tracking, GPS processing, and leaderboard generation.
Core APIs
Activity Management
POST /activities { activity_type, start_time, duration, gps_data }
– Upload a new activityGET /activities/{activity_id}
– Get activity detailsGET /activities?user_id=&type=&limit=
– List activities with filtersDELETE /activities/{activity_id}
– Delete an activity
GPS Processing
POST /activities/{activity_id}/gps { gps_points[] }
– Upload GPS dataGET /activities/{activity_id}/gps
– Get GPS track dataGET /activities/{activity_id}/map
– Get activity map visualization
Social Features
POST /activities/{activity_id}/like
– Like an activityGET /feed?user_id=&limit=
– Get user activity feedPOST /users/{user_id}/follow
– Follow a userGET /users/{user_id}/followers
– Get user followers
Leaderboards
GET /segments/{segment_id}/leaderboard
– Get segment leaderboardGET /segments?location=&radius=
– Find segments near locationPOST /segments { name, start_point, end_point }
– Create a segmentGET /users/{user_id}/achievements
– Get user achievements
High-Level Design
System Architecture Diagram
Key Components
- Activity Service: Handle activity CRUD operations
- GPS Processing Service: Process and analyze GPS data
- Social Service: Manage social features and feeds
- Leaderboard Service: Generate and maintain leaderboards
- Segment Service: Manage segments and route matching
- Database: Persistent storage for activities and user data
Mapping Core Functional Requirements to Components
Functional Requirement | Responsible Components | Key Considerations |
---|---|---|
Activity Tracking | Activity Service, GPS Processing Service | GPS data processing, metrics calculation |
Social Features | Social Service, Database | Feed generation, social interactions |
Leaderboards | Leaderboard Service, Segment Service | Segment matching, ranking algorithms |
Performance Analytics | GPS Processing Service, Database | Data analysis, progress tracking |
Detailed Design
GPS Processing Service
Purpose: Process GPS data and calculate activity metrics.
Key Design Decisions:
- Data Validation: Validate GPS data for accuracy and completeness
- Metrics Calculation: Calculate distance, pace, elevation, and other metrics
- Data Compression: Compress GPS data for storage efficiency
- Route Matching: Match activities to known segments and routes
Algorithm: GPS data processing
1. Receive GPS track data
2. Validate GPS points:
- Check coordinate accuracy
- Remove outliers and noise
- Fill gaps in track data
3. Calculate metrics:
- Total distance
- Average pace
- Elevation gain/loss
- Heart rate zones
4. Compress track data:
- Remove redundant points
- Use efficient encoding
5. Store processed data
6. Trigger segment matching
Leaderboard Service
Purpose: Generate and maintain leaderboards for segments.
Key Design Decisions:
- Segment Matching: Match activities to segments accurately
- Ranking Algorithm: Rank athletes by time or other metrics
- Real-time Updates: Update leaderboards in real-time
- Historical Data: Maintain historical leaderboard data
Algorithm: Leaderboard generation
1. Receive completed activity
2. Match activity to segments:
- Check if activity passes through segment
- Calculate segment time
- Validate segment completion
3. Update leaderboard:
- Add new entry to leaderboard
- Recalculate rankings
- Update personal records
4. Broadcast updates:
- Notify followers
- Update social feeds
- Send achievement notifications
5. Store leaderboard data
Social Service
Purpose: Manage social features and activity feeds.
Key Design Decisions:
- Feed Generation: Generate personalized activity feeds
- Social Interactions: Handle likes, comments, and follows
- Privacy Controls: Respect user privacy settings
- Content Moderation: Moderate user-generated content
Algorithm: Activity feed generation
1. Receive feed request from user
2. Get user's following list
3. Fetch recent activities from followed users
4. Apply privacy filters:
- Check user privacy settings
- Filter private activities
5. Rank activities:
- Recency factor
- User engagement
- Activity type preference
6. Return personalized feed
7. Cache feed for performance
Database Design
Activities Table
Field | Type | Description |
---|---|---|
activity_id | VARCHAR(36) | Primary key |
user_id | VARCHAR(36) | Activity owner |
activity_type | VARCHAR(50) | Type of activity |
start_time | TIMESTAMP | Activity start |
duration | INT | Duration in seconds |
distance | DECIMAL(10,2) | Distance covered |
elevation_gain | DECIMAL(8,2) | Elevation gained |
created_at | TIMESTAMP | Creation timestamp |
Indexes:
idx_user_id
on (user_id) - User activitiesidx_start_time
on (start_time) - Recent activitiesidx_activity_type
on (activity_type) - Activity type queries
GPS Points Table
Field | Type | Description |
---|---|---|
point_id | VARCHAR(36) | Primary key |
activity_id | VARCHAR(36) | Associated activity |
latitude | DECIMAL(10,8) | Latitude coordinate |
longitude | DECIMAL(11,8) | Longitude coordinate |
timestamp | TIMESTAMP | Point timestamp |
elevation | DECIMAL(8,2) | Elevation at point |
Indexes:
idx_activity_timestamp
on (activity_id, timestamp) - Activity trackidx_coordinates
on (latitude, longitude) - Geospatial queries
Segments Table
Field | Type | Description |
---|---|---|
segment_id | VARCHAR(36) | Primary key |
name | VARCHAR(255) | Segment name |
start_lat | DECIMAL(10,8) | Start latitude |
start_lng | DECIMAL(11,8) | Start longitude |
end_lat | DECIMAL(10,8) | End latitude |
end_lng | DECIMAL(11,8) | End longitude |
distance | DECIMAL(8,2) | Segment distance |
created_at | TIMESTAMP | Creation timestamp |
Indexes:
idx_start
on (start_lat, start_lng) - Start point queriesidx_end
on (end_lat, end_lng) - End point queries
Leaderboards Table
Field | Type | Description |
---|---|---|
leaderboard_id | VARCHAR(36) | Primary key |
segment_id | VARCHAR(36) | Associated segment |
activity_id | VARCHAR(36) | Associated activity |
user_id | VARCHAR(36) | Athlete |
time | INT | Segment time in seconds |
rank | INT | Leaderboard rank |
created_at | TIMESTAMP | Entry timestamp |
Indexes:
idx_segment_rank
on (segment_id, rank) - Segment leaderboardidx_user_id
on (user_id) - User achievements
Scalability Considerations
Horizontal Scaling
- Activity Service: Scale horizontally with load balancers
- GPS Processing Service: Use consistent hashing for data distribution
- Social Service: Scale social features with distributed systems
- Database: Shard activities by user_id
Caching Strategy
- Redis: Cache leaderboards and activity feeds
- CDN: Cache static content and images
- Application Cache: Cache frequently accessed data
Performance Optimization
- Connection Pooling: Efficient database connections
- Batch Processing: Batch GPS data processing for efficiency
- Async Processing: Non-blocking activity processing
- Resource Monitoring: Monitor CPU, memory, and network usage
Monitoring and Observability
Key Metrics
- Activity Upload Time: Average time to upload activities
- GPS Processing Time: Average time to process GPS data
- Leaderboard Update Time: Average time to update leaderboards
- System Health: CPU, memory, and disk usage
Alerting
- High Latency: Alert when processing time exceeds threshold
- GPS Processing Errors: Alert when GPS data processing fails
- Leaderboard Errors: Alert when leaderboard updates fail
- System Errors: Alert on activity processing failures
Trade-offs and Considerations
Consistency vs. Availability
- Choice: Eventual consistency for leaderboards, strong consistency for activities
- Reasoning: Leaderboards can tolerate slight delays, activities need immediate accuracy
Storage vs. Performance
- Choice: Use data compression for GPS data
- Reasoning: Balance between storage costs and query performance
Accuracy vs. Performance
- Choice: Use approximation algorithms for GPS processing
- Reasoning: Balance between GPS accuracy and processing speed
Common Interview Questions
Q: How would you handle GPS data accuracy?
A: Use data validation, outlier detection, and multiple data sources to ensure GPS data accuracy.
Q: How do you generate leaderboards efficiently?
A: Use segment matching, real-time updates, and efficient ranking algorithms to generate leaderboards efficiently.
Q: How would you scale this system globally?
A: Deploy regional processing servers, use geo-distributed databases, and implement data replication strategies.
Q: How do you handle social features at scale?
A: Use feed generation algorithms, caching, and distributed systems to handle social features at scale.
Key Takeaways
- GPS Processing: Data validation and metrics calculation are essential for accurate activity tracking
- Leaderboards: Segment matching and ranking algorithms are crucial for competitive features
- Social Features: Feed generation and social interactions enhance user engagement
- Scalability: Horizontal scaling and partitioning are crucial for handling large-scale fitness data
- Monitoring: Comprehensive monitoring ensures system reliability and performance