Design Yelp

System Design Challenge

medium
45-60 minutes
search-enginegeospatial-indexingrecommendation-systemuser-reviews

Design Yelp

What is Yelp?

Yelp is a local business discovery platform that allows users to find and review local businesses. It's similar to Google Maps, Foursquare, or TripAdvisor. The service provides business search, reviews, ratings, and recommendations.

Geospatial search with business discovery and review systems is what makes systems like Yelp unique. By understanding Yelp, you can tackle interview questions for similar local business platforms, since the core design challenges—geospatial search, business discovery, review management, and recommendations—remain the same.


Functional Requirements

Core (Interview Focussed)

  • Business Search: Users can search for businesses by location and category.
  • Business Discovery: Discover nearby businesses based on location.
  • Review Management: Users can write and read business reviews.
  • Recommendation System: Provide personalized business recommendations.

Out of Scope

  • User authentication and accounts
  • Business owner tools and analytics
  • Reservation and booking systems
  • Payment processing
  • Mobile app specific features

Non-Functional Requirements

Core (Interview Focussed)

  • Low latency: Sub-second response time for search queries.
  • High availability: 99.9% uptime for business discovery.
  • Scalability: Handle millions of businesses and reviews.
  • Accuracy: Accurate business information and recommendations.

Out of Scope

  • Data retention policies
  • Compliance and privacy regulations

💡 Interview Tip: Focus on low latency, high availability, and scalability. Interviewers care most about geospatial search, business discovery, and review management.


Core Entities

EntityKey AttributesNotes
Businessbusiness_id, name, category, location, rating, review_countIndexed by location for geospatial search
Reviewreview_id, business_id, user_id, rating, content, timestampIndexed by business_id for business reviews
Useruser_id, username, email, review_count, helpful_votesUser account information
Categorycategory_id, name, parent_category, business_countBusiness categorization
Locationlocation_id, latitude, longitude, address, cityGeographic location data

💡 Interview Tip: Focus on Business, Review, and Location as they drive business discovery, review management, and geospatial search.


Core APIs

  • GET /businesses/search?query=&location=&category=&radius= – Search for businesses
  • GET /businesses/{business_id} – Get business details
  • GET /businesses/nearby?latitude=&longitude=&radius=&category= – Find nearby businesses
  • GET /businesses?category=&city=&limit= – List businesses with filters

Review Management

  • POST /businesses/{business_id}/reviews { rating, content } – Write a business review
  • GET /businesses/{business_id}/reviews?sort=&limit= – Get business reviews
  • PUT /reviews/{review_id} { rating, content } – Update a review
  • DELETE /reviews/{review_id} – Delete a review

User Management

  • GET /users/{user_id} – Get user profile
  • GET /users/{user_id}/reviews – Get user's reviews
  • GET /users/{user_id}/recommendations – Get personalized recommendations
  • POST /reviews/{review_id}/helpful – Mark review as helpful

Categories

  • GET /categories – Get all business categories
  • GET /categories/{category_id}/businesses – Get businesses in category
  • GET /categories/{category_id}/subcategories – Get subcategories
  • GET /categories/search?query= – Search categories

High-Level Design

System Architecture Diagram

Key Components

  • Business Service: Handle business CRUD operations
  • Search Service: Process search queries and geospatial search
  • Review Service: Manage business reviews and ratings
  • Recommendation Service: Generate personalized business recommendations
  • Geospatial Service: Handle location-based queries
  • Database: Persistent storage for businesses, reviews, and users

Mapping Core Functional Requirements to Components

Functional RequirementResponsible ComponentsKey Considerations
Business SearchSearch Service, Geospatial ServiceSearch algorithms, geospatial indexing
Business DiscoveryGeospatial Service, Business ServiceLocation-based queries, business data
Review ManagementReview Service, DatabaseReview storage, rating calculation
Recommendation SystemRecommendation Service, Review ServicePersonalization, business ranking

Detailed Design

Search Service

Purpose: Process search queries and provide relevant business results.

Key Design Decisions:

  • Search Algorithms: Use full-text search and geospatial search
  • Result Ranking: Rank results by relevance, rating, and distance
  • Query Processing: Parse and optimize search queries
  • Caching: Cache frequent search results

Algorithm: Business search

1. Receive search query with location
2. Parse query parameters:
   - Search terms
   - Location coordinates
   - Category filters
   - Radius constraints
3. Execute search:
   - Full-text search on business names/descriptions
   - Geospatial search for location-based results
   - Category filtering
4. Rank results by:
   - Text relevance score
   - Business rating
   - Distance from user
   - Review count
5. Return ranked results
6. Cache results for performance

Geospatial Service

Purpose: Handle location-based queries and geospatial search.

Key Design Decisions:

  • Geospatial Indexing: Use R-tree or similar for spatial queries
  • Distance Calculation: Calculate distances efficiently
  • Location Validation: Validate location coordinates
  • Proximity Search: Find businesses within specified radius

Algorithm: Geospatial search

1. Receive location-based query
2. Validate location coordinates
3. Query geospatial index:
   - Find businesses within radius
   - Filter by category if specified
   - Sort by distance
4. Calculate distances:
   - Use Haversine formula for accuracy
   - Cache distance calculations
5. Return proximity-ranked results
6. Update search statistics

Review Service

Purpose: Manage business reviews and calculate ratings.

Key Design Decisions:

  • Review Storage: Store reviews efficiently with metadata
  • Rating Calculation: Calculate business ratings from reviews
  • Review Validation: Validate review content and ratings
  • Review Moderation: Moderate reviews for quality

Algorithm: Review processing

1. Receive review submission
2. Validate review:
   - Check rating range (1-5)
   - Validate content length
   - Check for spam/abuse
3. Store review in database
4. Update business rating:
   - Recalculate average rating
   - Update review count
   - Update rating distribution
5. Update user review count
6. Trigger recommendation updates

Recommendation Service

Purpose: Generate personalized business recommendations.

Key Design Decisions:

  • Collaborative Filtering: Use user behavior for recommendations
  • Content-based Filtering: Use business attributes for recommendations
  • Hybrid Approach: Combine multiple recommendation methods
  • Real-time Updates: Update recommendations based on user activity

Algorithm: Business recommendation

1. Analyze user preferences:
   - Review history
   - Rating patterns
   - Category preferences
   - Location patterns
2. Find similar users:
   - Users with similar review patterns
   - Users with similar preferences
3. Generate recommendations:
   - Businesses liked by similar users
   - Businesses in preferred categories
   - Businesses in preferred locations
4. Rank recommendations:
   - User preference score
   - Business rating
   - Distance from user
5. Return personalized recommendations

Database Design

Businesses Table

FieldTypeDescription
business_idVARCHAR(36)Primary key
nameVARCHAR(255)Business name
categoryVARCHAR(100)Business category
latitudeDECIMAL(10,8)Business latitude
longitudeDECIMAL(11,8)Business longitude
addressTEXTBusiness address
cityVARCHAR(100)Business city
ratingDECIMAL(3,2)Average rating
review_countINTNumber of reviews
created_atTIMESTAMPBusiness creation

Indexes:

  • idx_category on (category) - Category-based queries
  • idx_city on (city) - City-based queries
  • idx_rating on (rating) - Rating-based queries
  • idx_location on (latitude, longitude) - Geospatial queries

Reviews Table

FieldTypeDescription
review_idVARCHAR(36)Primary key
business_idVARCHAR(36)Associated business
user_idVARCHAR(36)Review author
ratingINTReview rating (1-5)
contentTEXTReview content
helpful_votesINTHelpful votes count
created_atTIMESTAMPReview creation

Indexes:

  • idx_business_id on (business_id) - Business reviews
  • idx_user_id on (user_id) - User reviews
  • idx_rating on (rating) - Rating-based queries
  • idx_created_at on (created_at) - Recent reviews

Users Table

FieldTypeDescription
user_idVARCHAR(36)Primary key
usernameVARCHAR(100)Username
emailVARCHAR(255)Email address
review_countINTNumber of reviews
helpful_votesINTHelpful votes received
created_atTIMESTAMPAccount creation

Indexes:

  • idx_username on (username) - Username lookup
  • idx_review_count on (review_count) - Active reviewers

Categories Table

FieldTypeDescription
category_idVARCHAR(36)Primary key
nameVARCHAR(100)Category name
parent_categoryVARCHAR(100)Parent category
business_countINTNumber of businesses

Indexes:

  • idx_name on (name) - Category lookup
  • idx_parent_category on (parent_category) - Subcategories

Scalability Considerations

Horizontal Scaling

  • Business Service: Scale horizontally with load balancers
  • Search Service: Use consistent hashing for search partitioning
  • Review Service: Scale review processing with distributed systems
  • Database: Shard businesses and reviews by geographic regions

Caching Strategy

  • Redis: Cache search results and business data
  • CDN: Cache static content and images
  • Application Cache: Cache frequently accessed data

Performance Optimization

  • Connection Pooling: Efficient database connections
  • Batch Processing: Batch review updates for efficiency
  • Async Processing: Non-blocking search processing
  • Resource Monitoring: Monitor CPU, memory, and network usage

Monitoring and Observability

Key Metrics

  • Search Latency: Average search response time
  • Review Processing Time: Average time to process reviews
  • Recommendation Accuracy: Accuracy of business recommendations
  • System Health: CPU, memory, and disk usage

Alerting

  • High Latency: Alert when search time exceeds threshold
  • Review Processing Errors: Alert when review processing fails
  • Recommendation Errors: Alert when recommendation generation fails
  • System Errors: Alert on business processing failures

Trade-offs and Considerations

Consistency vs. Availability

  • Choice: Eventual consistency for ratings, strong consistency for reviews
  • Reasoning: Ratings can tolerate slight delays, reviews need immediate accuracy

Latency vs. Accuracy

  • Choice: Use approximation algorithms for geospatial search
  • Reasoning: Balance between search accuracy and response time

Storage vs. Performance

  • Choice: Use efficient storage for business and review data
  • Reasoning: Balance between storage costs and query performance

Common Interview Questions

Q: How would you handle geospatial search at scale?

A: Use geospatial indexing, efficient distance calculations, and geographic partitioning to handle geospatial search at scale.

Q: How do you ensure review quality?

A: Use review validation, moderation systems, and user feedback to ensure review quality.

Q: How would you scale this system globally?

A: Deploy regional search servers, use geo-distributed databases, and implement data replication strategies.

Q: How do you handle business recommendation accuracy?

A: Use multiple recommendation algorithms, user feedback, and continuous learning to improve recommendation accuracy.


Key Takeaways

  1. Geospatial Search: Efficient spatial indexing and distance calculations are essential for location-based search
  2. Review Management: Review validation and rating calculation ensure accurate business information
  3. Recommendation System: Multiple recommendation methods provide better user experience
  4. Scalability: Horizontal scaling and geographic partitioning are crucial for handling large-scale business data
  5. Monitoring: Comprehensive monitoring ensures system reliability and performance