Language Selection
Choose your preferred programming language
Reduce Array Size to Half
Problem Statement
Given an array arr. You can choose a set of integers and remove all the occurrences of these integers in the array.
Return the minimum size of the set so that at least half of the integers of the array are removed.
Input/Output Specifications
- Input:
arr: Array of integers
- Output: Integer representing minimum number of distinct elements to remove
Constraints
1 <= arr.length <= 10^5arr.lengthis even1 <= arr[i] <= 10^5
Examples
Example 1:
Input: arr = [3,3,3,3,5,5,5,2,2,7]
Output: 2
Explanation: Choose {3,7} and remove them. Array becomes [5,5,5,2,2].
Size is 5, which is exactly half of original size 10.
Example 2:
Input: arr = [7,7,7,7,7,7]
Output: 1
Explanation: Choose {7}. Array becomes empty (size 0), which is less than half.
Example 3:
Input: arr = [1,9]
Output: 1
Explanation: Remove any one element to reduce size from 2 to 1.
Solution Approaches
Approach 1: Max-Heap (Greedy)
Algorithm Explanation:
- Count frequency of each element
- Use max-heap to prioritize most frequent elements
- Greedily remove most frequent elements first
- Stop when removed count ≥ half of array size
Implementation:
Python:
import heapq
from collections import Counter
def min_set_size(arr):
"""
Find minimum set size to remove at least half elements
Time: O(n log k) where k is unique elements
Space: O(k)
"""
if not arr:
return 0
# Count frequencies
count = Counter(arr)
target = len(arr) // 2
# Max-heap of frequencies (use negative values)
max_heap = [-freq for freq in count.values()]
heapq.heapify(max_heap)
removed = 0
set_size = 0
while removed < target:
# Get most frequent element count
freq = -heapq.heappop(max_heap)
removed += freq
set_size += 1
return set_size
# Alternative implementation without heap
def min_set_size_sorting(arr):
"""
Solve using sorting approach
Time: O(n log k)
Space: O(k)
"""
count = Counter(arr)
target = len(arr) // 2
# Sort frequencies in descending order
frequencies = sorted(count.values(), reverse=True)
removed = 0
set_size = 0
for freq in frequencies:
removed += freq
set_size += 1
if removed >= target:
break
return set_size
Java:
import java.util.*;
class Solution {
/**
* Find minimum set size using max-heap
* Time: O(n log k) where k is unique elements
* Space: O(k)
*/
public int minSetSize(int[] arr) {
if (arr.length == 0) return 0;
// Count frequencies
Map<Integer, Integer> count = new HashMap<>();
for (int num : arr) {
count.put(num, count.getOrDefault(num, 0) + 1);
}
// Max-heap of frequencies
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
maxHeap.addAll(count.values());
int target = arr.length / 2;
int removed = 0;
int setSize = 0;
while (removed < target) {
removed += maxHeap.poll();
setSize++;
}
return setSize;
}
}
Go:
import (
"container/heap"
"sort"
)
type MaxHeap []int
func (h MaxHeap) Len() int { return len(h) }
func (h MaxHeap) Less(i, j int) bool { return h[i] > h[j] }
func (h MaxHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *MaxHeap) Push(x interface{}) {
*h = append(*h, x.(int))
}
func (h *MaxHeap) Pop() interface{} {
old := *h
n := len(old)
x := old[n-1]
*h = old[0 : n-1]
return x
}
// minSetSize - Find minimum set size using max-heap
// Time: O(n log k), Space: O(k)
func minSetSize(arr []int) int {
if len(arr) == 0 {
return 0
}
// Count frequencies
count := make(map[int]int)
for _, num := range arr {
count[num]++
}
// Build max-heap of frequencies
maxHeap := &MaxHeap{}
heap.Init(maxHeap)
for _, freq := range count {
heap.Push(maxHeap, freq)
}
target := len(arr) / 2
removed := 0
setSize := 0
for removed < target {
freq := heap.Pop(maxHeap).(int)
removed += freq
setSize++
}
return setSize
}
// Alternative using sorting
func minSetSizeSorting(arr []int) int {
count := make(map[int]int)
for _, num := range arr {
count[num]++
}
frequencies := make([]int, 0, len(count))
for _, freq := range count {
frequencies = append(frequencies, freq)
}
sort.Slice(frequencies, func(i, j int) bool {
return frequencies[i] > frequencies[j]
})
target := len(arr) / 2
removed := 0
setSize := 0
for _, freq := range frequencies {
removed += freq
setSize++
if removed >= target {
break
}
}
return setSize
}
JavaScript:
/**
* Find minimum set size using sorting approach
* Time: O(n log k) where k is unique elements
* Space: O(k)
*/
function minSetSize(arr) {
if (arr.length === 0) return 0;
// Count frequencies
const count = new Map();
for (const num of arr) {
count.set(num, (count.get(num) || 0) + 1);
}
// Sort frequencies in descending order
const frequencies = Array.from(count.values()).sort((a, b) => b - a);
const target = Math.floor(arr.length / 2);
let removed = 0;
let setSize = 0;
for (const freq of frequencies) {
removed += freq;
setSize++;
if (removed >= target) {
break;
}
}
return setSize;
}
Approach 2: Bucket Sort (Optimal)
Algorithm Explanation:
- Count frequencies of all elements
- Create buckets where bucket[i] contains elements with frequency i
- Process buckets from highest frequency to lowest
- Greedily select elements until target reached
Implementation:
Python:
def min_set_size_bucket(arr):
"""
Find minimum set size using bucket sort
Time: O(n)
Space: O(n)
"""
count = Counter(arr)
n = len(arr)
target = n // 2
# Create frequency buckets
bucket = [[] for _ in range(n + 1)]
for num, freq in count.items():
bucket[freq].append(num)
removed = 0
set_size = 0
# Process from highest frequency to lowest
for freq in range(n, 0, -1):
if not bucket[freq]:
continue
for num in bucket[freq]:
removed += freq
set_size += 1
if removed >= target:
return set_size
return set_size
Key Insights
- Greedy Strategy: Always remove most frequent elements first
- Frequency Priority: Higher frequency elements reduce array size faster
- Target Calculation: Need to remove at least n/2 elements
- Early Termination: Stop as soon as target is reached
Edge Cases
- Minimum Array: Array of length 2
- All Same Elements: Single element type
- All Different: Each element appears once
- Even Distribution: All elements have same frequency
Test Cases
def test_min_set_size():
# Example 1
assert min_set_size([3,3,3,3,5,5,5,2,2,7]) == 2
# Example 2 - all same
assert min_set_size([7,7,7,7,7,7]) == 1
# Example 3 - minimum case
assert min_set_size([1,9]) == 1
# All different elements
assert min_set_size([1,2,3,4]) == 2
# Even distribution
assert min_set_size([1,1,2,2,3,3]) == 2
# Test bucket approach
assert min_set_size_bucket([3,3,3,3,5,5,5,2,2,7]) == 2
assert min_set_size_bucket([7,7,7,7,7,7]) == 1
print("All tests passed!")
test_min_set_size()
Interview Tips
- Greedy Insight: Explain why removing most frequent elements first is optimal
- Frequency Counting: Start with counting element frequencies
- Target Understanding: Clarify that we need to remove at least half
- Alternative Approaches: Mention sorting vs heap vs bucket sort
- Optimization: Bucket sort achieves O(n) time complexity
Follow-up Questions
- Exact Half: What if we need to remove exactly half?
- Minimize Distinct: Minimize number of distinct elements remaining
- Weight-based: Elements have different weights
- Multiple Operations: Allow other operations besides removal
- Online Algorithm: Handle stream of elements
Common Mistakes
- Wrong Target: Confusing “at least half” with “exactly half”
- Frequency Calculation: Not counting frequencies correctly
- Greedy Proof: Not understanding why greedy approach works
- Edge Cases: Not handling minimum array size properly
- Early Termination: Not stopping when target is reached
Concept Explanations
Greedy Optimality: Removing the most frequent elements first is optimal because each removal operation should eliminate as many array elements as possible to minimize the number of distinct elements we need to remove.
Frequency-Based Priority: The insight is that we’re not just removing elements, we’re removing all occurrences of chosen elements. Therefore, prioritizing by frequency maximizes the impact of each removal.
When to Apply: This pattern applies to optimization problems where you need to select a subset of items to maximize some cumulative benefit. The greedy approach works when the benefit function (in this case, frequency) allows for optimal substructure.