Reduce Array Size to Half

Find minimum removals to reduce array size by half using greedy heap approach

Language Selection

Choose your preferred programming language

Showing: Python

Reduce Array Size to Half

Problem Statement

Given an array arr. You can choose a set of integers and remove all the occurrences of these integers in the array.

Return the minimum size of the set so that at least half of the integers of the array are removed.

Input/Output Specifications

  • Input:
    • arr: Array of integers
  • Output: Integer representing minimum number of distinct elements to remove

Constraints

  • 1 <= arr.length <= 10^5
  • arr.length is even
  • 1 <= arr[i] <= 10^5

Examples

Example 1:

Input: arr = [3,3,3,3,5,5,5,2,2,7]
Output: 2
Explanation: Choose {3,7} and remove them. Array becomes [5,5,5,2,2].
Size is 5, which is exactly half of original size 10.

Example 2:

Input: arr = [7,7,7,7,7,7]
Output: 1
Explanation: Choose {7}. Array becomes empty (size 0), which is less than half.

Example 3:

Input: arr = [1,9]
Output: 1
Explanation: Remove any one element to reduce size from 2 to 1.

Solution Approaches

Approach 1: Max-Heap (Greedy)

Algorithm Explanation:

  1. Count frequency of each element
  2. Use max-heap to prioritize most frequent elements
  3. Greedily remove most frequent elements first
  4. Stop when removed count ≥ half of array size

Implementation:

Python:

import heapq
from collections import Counter

def min_set_size(arr):
    """
    Find minimum set size to remove at least half elements
    Time: O(n log k) where k is unique elements
    Space: O(k)
    """
    if not arr:
        return 0
    
    # Count frequencies
    count = Counter(arr)
    target = len(arr) // 2
    
    # Max-heap of frequencies (use negative values)
    max_heap = [-freq for freq in count.values()]
    heapq.heapify(max_heap)
    
    removed = 0
    set_size = 0
    
    while removed < target:
        # Get most frequent element count
        freq = -heapq.heappop(max_heap)
        removed += freq
        set_size += 1
    
    return set_size

# Alternative implementation without heap
def min_set_size_sorting(arr):
    """
    Solve using sorting approach
    Time: O(n log k)
    Space: O(k)
    """
    count = Counter(arr)
    target = len(arr) // 2
    
    # Sort frequencies in descending order
    frequencies = sorted(count.values(), reverse=True)
    
    removed = 0
    set_size = 0
    
    for freq in frequencies:
        removed += freq
        set_size += 1
        if removed >= target:
            break
    
    return set_size

Java:

import java.util.*;

class Solution {
    /**
     * Find minimum set size using max-heap
     * Time: O(n log k) where k is unique elements
     * Space: O(k)
     */
    public int minSetSize(int[] arr) {
        if (arr.length == 0) return 0;
        
        // Count frequencies
        Map<Integer, Integer> count = new HashMap<>();
        for (int num : arr) {
            count.put(num, count.getOrDefault(num, 0) + 1);
        }
        
        // Max-heap of frequencies
        PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
        maxHeap.addAll(count.values());
        
        int target = arr.length / 2;
        int removed = 0;
        int setSize = 0;
        
        while (removed < target) {
            removed += maxHeap.poll();
            setSize++;
        }
        
        return setSize;
    }
}

Go:

import (
    "container/heap"
    "sort"
)

type MaxHeap []int

func (h MaxHeap) Len() int           { return len(h) }
func (h MaxHeap) Less(i, j int) bool { return h[i] > h[j] }
func (h MaxHeap) Swap(i, j int)      { h[i], h[j] = h[j], h[i] }

func (h *MaxHeap) Push(x interface{}) {
    *h = append(*h, x.(int))
}

func (h *MaxHeap) Pop() interface{} {
    old := *h
    n := len(old)
    x := old[n-1]
    *h = old[0 : n-1]
    return x
}

// minSetSize - Find minimum set size using max-heap
// Time: O(n log k), Space: O(k)
func minSetSize(arr []int) int {
    if len(arr) == 0 {
        return 0
    }
    
    // Count frequencies
    count := make(map[int]int)
    for _, num := range arr {
        count[num]++
    }
    
    // Build max-heap of frequencies
    maxHeap := &MaxHeap{}
    heap.Init(maxHeap)
    
    for _, freq := range count {
        heap.Push(maxHeap, freq)
    }
    
    target := len(arr) / 2
    removed := 0
    setSize := 0
    
    for removed < target {
        freq := heap.Pop(maxHeap).(int)
        removed += freq
        setSize++
    }
    
    return setSize
}

// Alternative using sorting
func minSetSizeSorting(arr []int) int {
    count := make(map[int]int)
    for _, num := range arr {
        count[num]++
    }
    
    frequencies := make([]int, 0, len(count))
    for _, freq := range count {
        frequencies = append(frequencies, freq)
    }
    
    sort.Slice(frequencies, func(i, j int) bool {
        return frequencies[i] > frequencies[j]
    })
    
    target := len(arr) / 2
    removed := 0
    setSize := 0
    
    for _, freq := range frequencies {
        removed += freq
        setSize++
        if removed >= target {
            break
        }
    }
    
    return setSize
}

JavaScript:

/**
 * Find minimum set size using sorting approach
 * Time: O(n log k) where k is unique elements
 * Space: O(k)
 */
function minSetSize(arr) {
    if (arr.length === 0) return 0;
    
    // Count frequencies
    const count = new Map();
    for (const num of arr) {
        count.set(num, (count.get(num) || 0) + 1);
    }
    
    // Sort frequencies in descending order
    const frequencies = Array.from(count.values()).sort((a, b) => b - a);
    
    const target = Math.floor(arr.length / 2);
    let removed = 0;
    let setSize = 0;
    
    for (const freq of frequencies) {
        removed += freq;
        setSize++;
        if (removed >= target) {
            break;
        }
    }
    
    return setSize;
}

Approach 2: Bucket Sort (Optimal)

Algorithm Explanation:

  1. Count frequencies of all elements
  2. Create buckets where bucket[i] contains elements with frequency i
  3. Process buckets from highest frequency to lowest
  4. Greedily select elements until target reached

Implementation:

Python:

def min_set_size_bucket(arr):
    """
    Find minimum set size using bucket sort
    Time: O(n)
    Space: O(n)
    """
    count = Counter(arr)
    n = len(arr)
    target = n // 2
    
    # Create frequency buckets
    bucket = [[] for _ in range(n + 1)]
    for num, freq in count.items():
        bucket[freq].append(num)
    
    removed = 0
    set_size = 0
    
    # Process from highest frequency to lowest
    for freq in range(n, 0, -1):
        if not bucket[freq]:
            continue
        
        for num in bucket[freq]:
            removed += freq
            set_size += 1
            if removed >= target:
                return set_size
    
    return set_size

Key Insights

  • Greedy Strategy: Always remove most frequent elements first
  • Frequency Priority: Higher frequency elements reduce array size faster
  • Target Calculation: Need to remove at least n/2 elements
  • Early Termination: Stop as soon as target is reached

Edge Cases

  1. Minimum Array: Array of length 2
  2. All Same Elements: Single element type
  3. All Different: Each element appears once
  4. Even Distribution: All elements have same frequency

Test Cases

def test_min_set_size():
    # Example 1
    assert min_set_size([3,3,3,3,5,5,5,2,2,7]) == 2
    
    # Example 2 - all same
    assert min_set_size([7,7,7,7,7,7]) == 1
    
    # Example 3 - minimum case
    assert min_set_size([1,9]) == 1
    
    # All different elements
    assert min_set_size([1,2,3,4]) == 2
    
    # Even distribution
    assert min_set_size([1,1,2,2,3,3]) == 2
    
    # Test bucket approach
    assert min_set_size_bucket([3,3,3,3,5,5,5,2,2,7]) == 2
    assert min_set_size_bucket([7,7,7,7,7,7]) == 1
    
    print("All tests passed!")

test_min_set_size()

Interview Tips

  1. Greedy Insight: Explain why removing most frequent elements first is optimal
  2. Frequency Counting: Start with counting element frequencies
  3. Target Understanding: Clarify that we need to remove at least half
  4. Alternative Approaches: Mention sorting vs heap vs bucket sort
  5. Optimization: Bucket sort achieves O(n) time complexity

Follow-up Questions

  1. Exact Half: What if we need to remove exactly half?
  2. Minimize Distinct: Minimize number of distinct elements remaining
  3. Weight-based: Elements have different weights
  4. Multiple Operations: Allow other operations besides removal
  5. Online Algorithm: Handle stream of elements

Common Mistakes

  1. Wrong Target: Confusing “at least half” with “exactly half”
  2. Frequency Calculation: Not counting frequencies correctly
  3. Greedy Proof: Not understanding why greedy approach works
  4. Edge Cases: Not handling minimum array size properly
  5. Early Termination: Not stopping when target is reached

Concept Explanations

Greedy Optimality: Removing the most frequent elements first is optimal because each removal operation should eliminate as many array elements as possible to minimize the number of distinct elements we need to remove.

Frequency-Based Priority: The insight is that we’re not just removing elements, we’re removing all occurrences of chosen elements. Therefore, prioritizing by frequency maximizes the impact of each removal.

When to Apply: This pattern applies to optimization problems where you need to select a subset of items to maximize some cumulative benefit. The greedy approach works when the benefit function (in this case, frequency) allows for optimal substructure.