Design Strava

System Design Challenge

medium
45-60 minutes
activity-trackingsocial-feedsleaderboardsgps-processing

Design Strava

What is Strava?

Strava is a fitness tracking and social networking platform that allows users to track their workouts, share activities, and compete on leaderboards. It's similar to Garmin Connect, Nike Run Club, or MapMyRun. The service provides GPS tracking, social features, and performance analytics.

GPS data processing with social features and performance analytics is what makes systems like Strava unique. By understanding Strava, you can tackle interview questions for similar fitness platforms, since the core design challenges—GPS processing, social features, leaderboards, and performance analytics—remain the same.


Functional Requirements

Core (Interview Focussed)

  • Activity Tracking: Track workouts with GPS data and metrics.
  • Social Features: Share activities and follow other athletes.
  • Leaderboards: Compete on segments and routes.
  • Performance Analytics: Analyze workout data and progress.

Out of Scope

  • User authentication and accounts
  • Premium features and subscriptions
  • Device integration and sync
  • Training plans and coaching
  • Mobile app specific features

Non-Functional Requirements

Core (Interview Focussed)

  • High availability: 99.9% uptime for activity tracking.
  • Scalability: Handle millions of athletes and activities.
  • Performance: Fast activity upload and leaderboard updates.
  • Data accuracy: Accurate GPS data processing and metrics.

Out of Scope

  • Data retention policies
  • Compliance and privacy regulations

💡 Interview Tip: Focus on high availability, scalability, and performance. Interviewers care most about GPS processing, social features, and leaderboard algorithms.


Core Entities

EntityKey AttributesNotes
Activityactivity_id, user_id, activity_type, start_time, durationIndexed by user_id for user activities
GPSPointpoint_id, activity_id, latitude, longitude, timestampGPS track points
Segmentsegment_id, name, start_point, end_point, distanceCompetition segments
Leaderboardleaderboard_id, segment_id, activity_id, time, rankSegment leaderboards
Useruser_id, username, email, follower_countUser account information

💡 Interview Tip: Focus on Activity, GPSPoint, and Segment as they drive activity tracking, GPS processing, and leaderboard generation.


Core APIs

Activity Management

  • POST /activities { activity_type, start_time, duration, gps_data } – Upload a new activity
  • GET /activities/{activity_id} – Get activity details
  • GET /activities?user_id=&type=&limit= – List activities with filters
  • DELETE /activities/{activity_id} – Delete an activity

GPS Processing

  • POST /activities/{activity_id}/gps { gps_points[] } – Upload GPS data
  • GET /activities/{activity_id}/gps – Get GPS track data
  • GET /activities/{activity_id}/map – Get activity map visualization

Social Features

  • POST /activities/{activity_id}/like – Like an activity
  • GET /feed?user_id=&limit= – Get user activity feed
  • POST /users/{user_id}/follow – Follow a user
  • GET /users/{user_id}/followers – Get user followers

Leaderboards

  • GET /segments/{segment_id}/leaderboard – Get segment leaderboard
  • GET /segments?location=&radius= – Find segments near location
  • POST /segments { name, start_point, end_point } – Create a segment
  • GET /users/{user_id}/achievements – Get user achievements

High-Level Design

System Architecture Diagram

Key Components

  • Activity Service: Handle activity CRUD operations
  • GPS Processing Service: Process and analyze GPS data
  • Social Service: Manage social features and feeds
  • Leaderboard Service: Generate and maintain leaderboards
  • Segment Service: Manage segments and route matching
  • Database: Persistent storage for activities and user data

Mapping Core Functional Requirements to Components

Functional RequirementResponsible ComponentsKey Considerations
Activity TrackingActivity Service, GPS Processing ServiceGPS data processing, metrics calculation
Social FeaturesSocial Service, DatabaseFeed generation, social interactions
LeaderboardsLeaderboard Service, Segment ServiceSegment matching, ranking algorithms
Performance AnalyticsGPS Processing Service, DatabaseData analysis, progress tracking

Detailed Design

GPS Processing Service

Purpose: Process GPS data and calculate activity metrics.

Key Design Decisions:

  • Data Validation: Validate GPS data for accuracy and completeness
  • Metrics Calculation: Calculate distance, pace, elevation, and other metrics
  • Data Compression: Compress GPS data for storage efficiency
  • Route Matching: Match activities to known segments and routes

Algorithm: GPS data processing

1. Receive GPS track data
2. Validate GPS points:
   - Check coordinate accuracy
   - Remove outliers and noise
   - Fill gaps in track data
3. Calculate metrics:
   - Total distance
   - Average pace
   - Elevation gain/loss
   - Heart rate zones
4. Compress track data:
   - Remove redundant points
   - Use efficient encoding
5. Store processed data
6. Trigger segment matching

Leaderboard Service

Purpose: Generate and maintain leaderboards for segments.

Key Design Decisions:

  • Segment Matching: Match activities to segments accurately
  • Ranking Algorithm: Rank athletes by time or other metrics
  • Real-time Updates: Update leaderboards in real-time
  • Historical Data: Maintain historical leaderboard data

Algorithm: Leaderboard generation

1. Receive completed activity
2. Match activity to segments:
   - Check if activity passes through segment
   - Calculate segment time
   - Validate segment completion
3. Update leaderboard:
   - Add new entry to leaderboard
   - Recalculate rankings
   - Update personal records
4. Broadcast updates:
   - Notify followers
   - Update social feeds
   - Send achievement notifications
5. Store leaderboard data

Social Service

Purpose: Manage social features and activity feeds.

Key Design Decisions:

  • Feed Generation: Generate personalized activity feeds
  • Social Interactions: Handle likes, comments, and follows
  • Privacy Controls: Respect user privacy settings
  • Content Moderation: Moderate user-generated content

Algorithm: Activity feed generation

1. Receive feed request from user
2. Get user's following list
3. Fetch recent activities from followed users
4. Apply privacy filters:
   - Check user privacy settings
   - Filter private activities
5. Rank activities:
   - Recency factor
   - User engagement
   - Activity type preference
6. Return personalized feed
7. Cache feed for performance

Database Design

Activities Table

FieldTypeDescription
activity_idVARCHAR(36)Primary key
user_idVARCHAR(36)Activity owner
activity_typeVARCHAR(50)Type of activity
start_timeTIMESTAMPActivity start
durationINTDuration in seconds
distanceDECIMAL(10,2)Distance covered
elevation_gainDECIMAL(8,2)Elevation gained
created_atTIMESTAMPCreation timestamp

Indexes:

  • idx_user_id on (user_id) - User activities
  • idx_start_time on (start_time) - Recent activities
  • idx_activity_type on (activity_type) - Activity type queries

GPS Points Table

FieldTypeDescription
point_idVARCHAR(36)Primary key
activity_idVARCHAR(36)Associated activity
latitudeDECIMAL(10,8)Latitude coordinate
longitudeDECIMAL(11,8)Longitude coordinate
timestampTIMESTAMPPoint timestamp
elevationDECIMAL(8,2)Elevation at point

Indexes:

  • idx_activity_timestamp on (activity_id, timestamp) - Activity track
  • idx_coordinates on (latitude, longitude) - Geospatial queries

Segments Table

FieldTypeDescription
segment_idVARCHAR(36)Primary key
nameVARCHAR(255)Segment name
start_latDECIMAL(10,8)Start latitude
start_lngDECIMAL(11,8)Start longitude
end_latDECIMAL(10,8)End latitude
end_lngDECIMAL(11,8)End longitude
distanceDECIMAL(8,2)Segment distance
created_atTIMESTAMPCreation timestamp

Indexes:

  • idx_start on (start_lat, start_lng) - Start point queries
  • idx_end on (end_lat, end_lng) - End point queries

Leaderboards Table

FieldTypeDescription
leaderboard_idVARCHAR(36)Primary key
segment_idVARCHAR(36)Associated segment
activity_idVARCHAR(36)Associated activity
user_idVARCHAR(36)Athlete
timeINTSegment time in seconds
rankINTLeaderboard rank
created_atTIMESTAMPEntry timestamp

Indexes:

  • idx_segment_rank on (segment_id, rank) - Segment leaderboard
  • idx_user_id on (user_id) - User achievements

Scalability Considerations

Horizontal Scaling

  • Activity Service: Scale horizontally with load balancers
  • GPS Processing Service: Use consistent hashing for data distribution
  • Social Service: Scale social features with distributed systems
  • Database: Shard activities by user_id

Caching Strategy

  • Redis: Cache leaderboards and activity feeds
  • CDN: Cache static content and images
  • Application Cache: Cache frequently accessed data

Performance Optimization

  • Connection Pooling: Efficient database connections
  • Batch Processing: Batch GPS data processing for efficiency
  • Async Processing: Non-blocking activity processing
  • Resource Monitoring: Monitor CPU, memory, and network usage

Monitoring and Observability

Key Metrics

  • Activity Upload Time: Average time to upload activities
  • GPS Processing Time: Average time to process GPS data
  • Leaderboard Update Time: Average time to update leaderboards
  • System Health: CPU, memory, and disk usage

Alerting

  • High Latency: Alert when processing time exceeds threshold
  • GPS Processing Errors: Alert when GPS data processing fails
  • Leaderboard Errors: Alert when leaderboard updates fail
  • System Errors: Alert on activity processing failures

Trade-offs and Considerations

Consistency vs. Availability

  • Choice: Eventual consistency for leaderboards, strong consistency for activities
  • Reasoning: Leaderboards can tolerate slight delays, activities need immediate accuracy

Storage vs. Performance

  • Choice: Use data compression for GPS data
  • Reasoning: Balance between storage costs and query performance

Accuracy vs. Performance

  • Choice: Use approximation algorithms for GPS processing
  • Reasoning: Balance between GPS accuracy and processing speed

Common Interview Questions

Q: How would you handle GPS data accuracy?

A: Use data validation, outlier detection, and multiple data sources to ensure GPS data accuracy.

Q: How do you generate leaderboards efficiently?

A: Use segment matching, real-time updates, and efficient ranking algorithms to generate leaderboards efficiently.

Q: How would you scale this system globally?

A: Deploy regional processing servers, use geo-distributed databases, and implement data replication strategies.

Q: How do you handle social features at scale?

A: Use feed generation algorithms, caching, and distributed systems to handle social features at scale.


Key Takeaways

  1. GPS Processing: Data validation and metrics calculation are essential for accurate activity tracking
  2. Leaderboards: Segment matching and ranking algorithms are crucial for competitive features
  3. Social Features: Feed generation and social interactions enhance user engagement
  4. Scalability: Horizontal scaling and partitioning are crucial for handling large-scale fitness data
  5. Monitoring: Comprehensive monitoring ensures system reliability and performance