Language Selection
Choose your preferred programming language
Sort Characters by Frequency
Problem Statement
Given a string s, sort it in decreasing order based on the frequency of the characters. The frequency of a character is the number of times it appears in the string.
Return the sorted string. If there are multiple answers, return any of them.
Input/Output Specifications
- Input:
s: String consisting of uppercase/lowercase letters and digits
- Output: String sorted by character frequency (descending)
Constraints
1 <= s.length <= 5 * 10^5sconsists of uppercase and lowercase English letters and digits
Examples
Example 1:
Input: s = "tree"
Output: "eert" or "eetr"
Explanation: 'e' appears twice, 't' and 'r' appear once.
Example 2:
Input: s = "cccaaa"
Output: "aaaccc" or "cccaaa"
Explanation: Both 'a' and 'c' appear 3 times. Order doesn't matter for same frequency.
Example 3:
Input: s = "Aabb"
Output: "bbAa" or "bbaA"
Explanation: 'b' appears twice, 'A' and 'a' appear once each.
Solution Approaches
Approach 1: Max-Heap (Priority Queue)
Algorithm Explanation:
- Count frequency of each character
- Add all (frequency, character) pairs to max-heap
- Extract from heap and build result string
- Repeat each character according to its frequency
Implementation:
Python:
import heapq
from collections import Counter
def frequency_sort_heap(s):
"""
Sort characters by frequency using max-heap
Time: O(n log k) where k is unique characters
Space: O(n)
"""
if not s:
return ""
# Count character frequencies
count = Counter(s)
# Max-heap (use negative frequencies for min-heap)
max_heap = [(-freq, char) for char, freq in count.items()]
heapq.heapify(max_heap)
result = []
while max_heap:
neg_freq, char = heapq.heappop(max_heap)
freq = -neg_freq
result.append(char * freq)
return ''.join(result)
Java:
import java.util.*;
class Solution {
/**
* Sort characters by frequency using max-heap
* Time: O(n log k) where k is unique characters
* Space: O(n)
*/
public String frequencySort(String s) {
if (s == null || s.isEmpty()) {
return "";
}
// Count frequencies
Map<Character, Integer> count = new HashMap<>();
for (char c : s.toCharArray()) {
count.put(c, count.getOrDefault(c, 0) + 1);
}
// Max-heap based on frequency
PriorityQueue<Map.Entry<Character, Integer>> maxHeap =
new PriorityQueue<>((a, b) -> b.getValue() - a.getValue());
maxHeap.addAll(count.entrySet());
StringBuilder result = new StringBuilder();
while (!maxHeap.isEmpty()) {
Map.Entry<Character, Integer> entry = maxHeap.poll();
char character = entry.getKey();
int frequency = entry.getValue();
for (int i = 0; i < frequency; i++) {
result.append(character);
}
}
return result.toString();
}
}
Approach 2: Bucket Sort (Optimal)
Algorithm Explanation:
- Count character frequencies
- Create buckets where bucket[i] contains characters with frequency i
- Iterate from highest frequency to lowest, building result
Implementation:
Python:
def frequency_sort_bucket(s):
"""
Sort characters by frequency using bucket sort
Time: O(n)
Space: O(n)
"""
if not s:
return ""
# Count frequencies
count = Counter(s)
n = len(s)
# Create buckets: bucket[i] = list of chars with frequency i
bucket = [[] for _ in range(n + 1)]
for char, freq in count.items():
bucket[freq].append(char)
# Build result from highest frequency to lowest
result = []
for freq in range(n, 0, -1):
for char in bucket[freq]:
result.append(char * freq)
return ''.join(result)
Approach 3: Sorting with Custom Comparator
Implementation:
Python:
def frequency_sort_sorting(s):
"""
Sort using custom comparator
Time: O(n log k) where k is unique characters
Space: O(n)
"""
count = Counter(s)
# Sort characters by frequency (descending)
sorted_chars = sorted(count.items(), key=lambda x: x[1], reverse=True)
result = []
for char, freq in sorted_chars:
result.append(char * freq)
return ''.join(result)
Key Insights
- Frequency Mapping: Always start by counting character frequencies
- Max-Heap Efficiency: Good when number of unique characters is small
- Bucket Sort Optimization: O(n) time using frequency as bucket index
- Character Repetition: Build result by repeating each character freq times
Edge Cases
- Empty String: Return empty string
- Single Character: Return same string
- All Same Frequency: Any order acceptable for ties
- All Different Characters: Reverse alphabetical or any order
- Case Sensitivity: ‘A’ and ‘a’ are different characters
Test Cases
def test_frequency_sort():
# Example 1
result1 = frequency_sort_heap("tree")
assert result1.count('e') == 2 and result1.count('t') == 1 and result1.count('r') == 1
assert result1.index('e') < result1.index('t') and result1.index('e') < result1.index('r')
# Example 2
result2 = frequency_sort_heap("cccaaa")
assert len(result2) == 6 and (result2.startswith("ccc") or result2.startswith("aaa"))
# Single character
assert frequency_sort_heap("a") == "a"
# Empty string
assert frequency_sort_heap("") == ""
# All different
result3 = frequency_sort_heap("abc")
assert len(result3) == 3 and set(result3) == {'a', 'b', 'c'}
# Test bucket sort
result4 = frequency_sort_bucket("tree")
assert result4.count('e') == 2
print("All tests passed!")
test_frequency_sort()
Interview Tips
- Count First: Always start by counting character frequencies
- Discuss Approaches: Heap vs bucket sort vs sorting
- Tie Handling: Clarify behavior for same frequencies
- Optimization: Bucket sort is optimal for this problem
- Edge Cases: Empty string, single character, all same frequency
Follow-up Questions
- Kth Most Frequent: Find kth most frequent character
- Stable Sort: Maintain relative order for same frequencies
- Streaming Data: Handle characters arriving continuously
- Memory Constraint: What if result string doesn’t fit in memory?
- Custom Frequency: Use different weighting for different characters
Common Mistakes
- Wrong Heap Type: Using min-heap instead of max-heap
- Character Building: Not repeating characters according to frequency
- Frequency Extraction: Incorrectly handling frequency counting
- Edge Cases: Not handling empty string properly
- Tie Breaking: Not considering what to do with equal frequencies
Concept Explanations
Frequency-Based Sorting: The key insight is to transform the problem from sorting individual characters to sorting character types by their frequency, then reconstructing the string.
Bucket Sort Advantage: Since frequencies are bounded by string length, we can use frequency as a direct index into buckets, achieving linear time complexity.
When to Apply: Use this pattern for any problem involving frequency-based reordering, whether with characters, numbers, or other elements. The bucket sort optimization is particularly powerful when the range of frequencies is limited.