Sort Characters by Frequency

Sort characters in string by frequency using heap or bucket sort

Language Selection

Choose your preferred programming language

Showing: Python

Sort Characters by Frequency

Problem Statement

Given a string s, sort it in decreasing order based on the frequency of the characters. The frequency of a character is the number of times it appears in the string.

Return the sorted string. If there are multiple answers, return any of them.

Input/Output Specifications

  • Input:
    • s: String consisting of uppercase/lowercase letters and digits
  • Output: String sorted by character frequency (descending)

Constraints

  • 1 <= s.length <= 5 * 10^5
  • s consists of uppercase and lowercase English letters and digits

Examples

Example 1:

Input: s = "tree"
Output: "eert" or "eetr"
Explanation: 'e' appears twice, 't' and 'r' appear once.

Example 2:

Input: s = "cccaaa"
Output: "aaaccc" or "cccaaa"
Explanation: Both 'a' and 'c' appear 3 times. Order doesn't matter for same frequency.

Example 3:

Input: s = "Aabb"
Output: "bbAa" or "bbaA"
Explanation: 'b' appears twice, 'A' and 'a' appear once each.

Solution Approaches

Approach 1: Max-Heap (Priority Queue)

Algorithm Explanation:

  1. Count frequency of each character
  2. Add all (frequency, character) pairs to max-heap
  3. Extract from heap and build result string
  4. Repeat each character according to its frequency

Implementation:

Python:

import heapq
from collections import Counter

def frequency_sort_heap(s):
    """
    Sort characters by frequency using max-heap
    Time: O(n log k) where k is unique characters
    Space: O(n)
    """
    if not s:
        return ""
    
    # Count character frequencies
    count = Counter(s)
    
    # Max-heap (use negative frequencies for min-heap)
    max_heap = [(-freq, char) for char, freq in count.items()]
    heapq.heapify(max_heap)
    
    result = []
    
    while max_heap:
        neg_freq, char = heapq.heappop(max_heap)
        freq = -neg_freq
        result.append(char * freq)
    
    return ''.join(result)

Java:

import java.util.*;

class Solution {
    /**
     * Sort characters by frequency using max-heap
     * Time: O(n log k) where k is unique characters
     * Space: O(n)
     */
    public String frequencySort(String s) {
        if (s == null || s.isEmpty()) {
            return "";
        }
        
        // Count frequencies
        Map<Character, Integer> count = new HashMap<>();
        for (char c : s.toCharArray()) {
            count.put(c, count.getOrDefault(c, 0) + 1);
        }
        
        // Max-heap based on frequency
        PriorityQueue<Map.Entry<Character, Integer>> maxHeap = 
            new PriorityQueue<>((a, b) -> b.getValue() - a.getValue());
        
        maxHeap.addAll(count.entrySet());
        
        StringBuilder result = new StringBuilder();
        
        while (!maxHeap.isEmpty()) {
            Map.Entry<Character, Integer> entry = maxHeap.poll();
            char character = entry.getKey();
            int frequency = entry.getValue();
            
            for (int i = 0; i < frequency; i++) {
                result.append(character);
            }
        }
        
        return result.toString();
    }
}

Approach 2: Bucket Sort (Optimal)

Algorithm Explanation:

  1. Count character frequencies
  2. Create buckets where bucket[i] contains characters with frequency i
  3. Iterate from highest frequency to lowest, building result

Implementation:

Python:

def frequency_sort_bucket(s):
    """
    Sort characters by frequency using bucket sort
    Time: O(n)
    Space: O(n)
    """
    if not s:
        return ""
    
    # Count frequencies
    count = Counter(s)
    n = len(s)
    
    # Create buckets: bucket[i] = list of chars with frequency i
    bucket = [[] for _ in range(n + 1)]
    
    for char, freq in count.items():
        bucket[freq].append(char)
    
    # Build result from highest frequency to lowest
    result = []
    for freq in range(n, 0, -1):
        for char in bucket[freq]:
            result.append(char * freq)
    
    return ''.join(result)

Approach 3: Sorting with Custom Comparator

Implementation:

Python:

def frequency_sort_sorting(s):
    """
    Sort using custom comparator
    Time: O(n log k) where k is unique characters  
    Space: O(n)
    """
    count = Counter(s)
    
    # Sort characters by frequency (descending)
    sorted_chars = sorted(count.items(), key=lambda x: x[1], reverse=True)
    
    result = []
    for char, freq in sorted_chars:
        result.append(char * freq)
    
    return ''.join(result)

Key Insights

  • Frequency Mapping: Always start by counting character frequencies
  • Max-Heap Efficiency: Good when number of unique characters is small
  • Bucket Sort Optimization: O(n) time using frequency as bucket index
  • Character Repetition: Build result by repeating each character freq times

Edge Cases

  1. Empty String: Return empty string
  2. Single Character: Return same string
  3. All Same Frequency: Any order acceptable for ties
  4. All Different Characters: Reverse alphabetical or any order
  5. Case Sensitivity: ‘A’ and ‘a’ are different characters

Test Cases

def test_frequency_sort():
    # Example 1
    result1 = frequency_sort_heap("tree")
    assert result1.count('e') == 2 and result1.count('t') == 1 and result1.count('r') == 1
    assert result1.index('e') < result1.index('t') and result1.index('e') < result1.index('r')
    
    # Example 2
    result2 = frequency_sort_heap("cccaaa")
    assert len(result2) == 6 and (result2.startswith("ccc") or result2.startswith("aaa"))
    
    # Single character
    assert frequency_sort_heap("a") == "a"
    
    # Empty string
    assert frequency_sort_heap("") == ""
    
    # All different
    result3 = frequency_sort_heap("abc")
    assert len(result3) == 3 and set(result3) == {'a', 'b', 'c'}
    
    # Test bucket sort
    result4 = frequency_sort_bucket("tree")
    assert result4.count('e') == 2
    
    print("All tests passed!")

test_frequency_sort()

Interview Tips

  1. Count First: Always start by counting character frequencies
  2. Discuss Approaches: Heap vs bucket sort vs sorting
  3. Tie Handling: Clarify behavior for same frequencies
  4. Optimization: Bucket sort is optimal for this problem
  5. Edge Cases: Empty string, single character, all same frequency

Follow-up Questions

  1. Kth Most Frequent: Find kth most frequent character
  2. Stable Sort: Maintain relative order for same frequencies
  3. Streaming Data: Handle characters arriving continuously
  4. Memory Constraint: What if result string doesn’t fit in memory?
  5. Custom Frequency: Use different weighting for different characters

Common Mistakes

  1. Wrong Heap Type: Using min-heap instead of max-heap
  2. Character Building: Not repeating characters according to frequency
  3. Frequency Extraction: Incorrectly handling frequency counting
  4. Edge Cases: Not handling empty string properly
  5. Tie Breaking: Not considering what to do with equal frequencies

Concept Explanations

Frequency-Based Sorting: The key insight is to transform the problem from sorting individual characters to sorting character types by their frequency, then reconstructing the string.

Bucket Sort Advantage: Since frequencies are bounded by string length, we can use frequency as a direct index into buckets, achieving linear time complexity.

When to Apply: Use this pattern for any problem involving frequency-based reordering, whether with characters, numbers, or other elements. The bucket sort optimization is particularly powerful when the range of frequencies is limited.