Find Duplicate Number

Find the duplicate number in an array containing n+1 integers where each integer is between 1 and n (inclusive)

Language Selection

Choose your preferred programming language

Showing: Python

Find Duplicate Number

Problem Statement

Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

Input/Output Specifications

  • Input: An array nums where 1 <= nums.length <= 3 * 10^4 and 1 <= nums[i] <= nums.length - 1
  • Output: The duplicate number

Constraints

  • You must not modify the array (assume the array is read only)
  • You must use only constant, O(1) extra space
  • Your runtime complexity should be less than O(n²)
  • There is only one duplicate number in the array, but it could be repeated more than once

Examples

Example 1:

Input: nums = [1,3,4,2,2]
Output: 2
Explanation: The duplicate number is 2.

Example 2:

Input: nums = [3,1,3,4,2]
Output: 3
Explanation: The duplicate number is 3.

Example 3:

Input: nums = [1,1]
Output: 1
Explanation: The duplicate number is 1.

Solution Approaches

Approach 1: Floyd’s Cycle Detection Algorithm (Optimal)

Algorithm Explanation: This problem can be solved using Floyd’s cycle detection algorithm by treating the array as a linked list where each index points to the value at that index. Since there’s a duplicate, there will be a cycle in this “linked list”.

  1. Use two pointers: slow and fast
  2. Move slow pointer one step at a time, fast pointer two steps at a time
  3. When they meet, reset one pointer to the beginning
  4. Move both pointers one step at a time until they meet again
  5. The meeting point is the duplicate number

Implementation:

Python:

def findDuplicate(nums):
    """
    Find duplicate using Floyd's cycle detection algorithm
    Time: O(n)
    Space: O(1)
    """
    # Phase 1: Find the intersection point in the cycle
    slow = fast = nums[0]
    
    # Move slow one step, fast two steps
    while True:
        slow = nums[slow]
        fast = nums[nums[fast]]
        if slow == fast:
            break
    
    # Phase 2: Find the entrance to the cycle
    slow = nums[0]
    while slow != fast:
        slow = nums[slow]
        fast = nums[fast]
    
    return slow

Java:

class Solution {
    /**
     * Find duplicate using Floyd's cycle detection algorithm
     * Time: O(n)
     * Space: O(1)
     */
    public int findDuplicate(int[] nums) {
        // Phase 1: Find the intersection point in the cycle
        int slow = nums[0];
        int fast = nums[0];
        
        // Move slow one step, fast two steps
        do {
            slow = nums[slow];
            fast = nums[nums[fast]];
        } while (slow != fast);
        
        // Phase 2: Find the entrance to the cycle
        slow = nums[0];
        while (slow != fast) {
            slow = nums[slow];
            fast = nums[fast];
        }
        
        return slow;
    }
}

Go:

// findDuplicate - Find duplicate using Floyd's cycle detection algorithm
// Time: O(n)
// Space: O(1)
func findDuplicate(nums []int) int {
    // Phase 1: Find the intersection point in the cycle
    slow := nums[0]
    fast := nums[0]
    
    // Move slow one step, fast two steps
    for {
        slow = nums[slow]
        fast = nums[nums[fast]]
        if slow == fast {
            break
        }
    }
    
    // Phase 2: Find the entrance to the cycle
    slow = nums[0]
    for slow != fast {
        slow = nums[slow]
        fast = nums[fast]
    }
    
    return slow
}

JavaScript:

/**
 * Find duplicate using Floyd's cycle detection algorithm
 * Time: O(n)
 * Space: O(1)
 */
function findDuplicate(nums) {
    // Phase 1: Find the intersection point in the cycle
    let slow = nums[0];
    let fast = nums[0];
    
    // Move slow one step, fast two steps
    do {
        slow = nums[slow];
        fast = nums[nums[fast]];
    } while (slow !== fast);
    
    // Phase 2: Find the entrance to the cycle
    slow = nums[0];
    while (slow !== fast) {
        slow = nums[slow];
        fast = nums[fast];
    }
    
    return slow;
}

C#:

public class Solution {
    /// <summary>
    /// Find duplicate using Floyd's cycle detection algorithm
    /// Time: O(n)
    /// Space: O(1)
    /// </summary>
    public int FindDuplicate(int[] nums) {
        // Phase 1: Find the intersection point in the cycle
        int slow = nums[0];
        int fast = nums[0];
        
        // Move slow one step, fast two steps
        do {
            slow = nums[slow];
            fast = nums[nums[fast]];
        } while (slow != fast);
        
        // Phase 2: Find the entrance to the cycle
        slow = nums[0];
        while (slow != fast) {
            slow = nums[slow];
            fast = nums[fast];
        }
        
        return slow;
    }
}

Complexity Analysis:

  • Time Complexity: O(n) - Each element is visited at most twice
  • Space Complexity: O(1) - Only using constant extra space

Trade-offs:

  • Optimal time and space complexity
  • Clever algorithm that treats array as linked list
  • Requires understanding of cycle detection
  • Not immediately intuitive

Approach 2: Binary Search on Answer

Algorithm Explanation: Since we know the answer is between 1 and n, we can use binary search to find the duplicate number by counting how many numbers are less than or equal to the middle value.

  1. Binary search on the range [1, n]
  2. For each mid value, count how many numbers in the array are <= mid
  3. If count > mid, the duplicate is in the left half, otherwise right half
  4. Continue until we find the duplicate

Implementation:

Python:

def findDuplicateBinarySearch(nums):
    """
    Find duplicate using binary search on answer
    Time: O(n log n)
    Space: O(1)
    """
    left, right = 1, len(nums) - 1
    
    while left < right:
        mid = (left + right) // 2
        
        # Count numbers <= mid
        count = 0
        for num in nums:
            if num <= mid:
                count += 1
        
        # If count > mid, duplicate is in left half
        if count > mid:
            right = mid
        else:
            left = mid + 1
    
    return left

Java:

class Solution {
    /**
     * Find duplicate using binary search on answer
     * Time: O(n log n)
     * Space: O(1)
     */
    public int findDuplicateBinarySearch(int[] nums) {
        int left = 1, right = nums.length - 1;
        
        while (left < right) {
            int mid = left + (right - left) / 2;
            
            // Count numbers <= mid
            int count = 0;
            for (int num : nums) {
                if (num <= mid) {
                    count++;
                }
            }
            
            // If count > mid, duplicate is in left half
            if (count > mid) {
                right = mid;
            } else {
                left = mid + 1;
            }
        }
        
        return left;
    }
}

Go:

// findDuplicateBinarySearch - Find duplicate using binary search on answer
// Time: O(n log n)
// Space: O(1)
func findDuplicateBinarySearch(nums []int) int {
    left, right := 1, len(nums)-1
    
    for left < right {
        mid := left + (right-left)/2
        
        // Count numbers <= mid
        count := 0
        for _, num := range nums {
            if num <= mid {
                count++
            }
        }
        
        // If count > mid, duplicate is in left half
        if count > mid {
            right = mid
        } else {
            left = mid + 1
        }
    }
    
    return left
}

JavaScript:

/**
 * Find duplicate using binary search on answer
 * Time: O(n log n)
 * Space: O(1)
 */
function findDuplicateBinarySearch(nums) {
    let left = 1, right = nums.length - 1;
    
    while (left < right) {
        const mid = Math.floor((left + right) / 2);
        
        // Count numbers <= mid
        let count = 0;
        for (const num of nums) {
            if (num <= mid) {
                count++;
            }
        }
        
        // If count > mid, duplicate is in left half
        if (count > mid) {
            right = mid;
        } else {
            left = mid + 1;
        }
    }
    
    return left;
}

C#:

public class Solution {
    /// <summary>
    /// Find duplicate using binary search on answer
    /// Time: O(n log n)
    /// Space: O(1)
    /// </summary>
    public int FindDuplicateBinarySearch(int[] nums) {
        int left = 1, right = nums.Length - 1;
        
        while (left < right) {
            int mid = left + (right - left) / 2;
            
            // Count numbers <= mid
            int count = 0;
            foreach (int num in nums) {
                if (num <= mid) {
                    count++;
                }
            }
            
            // If count > mid, duplicate is in left half
            if (count > mid) {
                right = mid;
            } else {
                left = mid + 1;
            }
        }
        
        return left;
    }
}

Complexity Analysis:

  • Time Complexity: O(n log n) - Binary search takes O(log n) iterations, each counting takes O(n)
  • Space Complexity: O(1) - Only using constant extra space

Trade-offs:

  • Simpler to understand than Floyd’s algorithm
  • Higher time complexity
  • Still maintains O(1) space complexity
  • More intuitive approach

Approach 3: Hash Set (Not Optimal for Constraints)

Algorithm Explanation: Use a hash set to track seen numbers. When we encounter a number that’s already in the set, it’s the duplicate.

Implementation:

Python:

def findDuplicateHashSet(nums):
    """
    Find duplicate using hash set
    Time: O(n)
    Space: O(n) - violates space constraint
    """
    seen = set()
    for num in nums:
        if num in seen:
            return num
        seen.add(num)
    return -1  # Should never reach here

Java:

class Solution {
    /**
     * Find duplicate using hash set
     * Time: O(n)
     * Space: O(n) - violates space constraint
     */
    public int findDuplicateHashSet(int[] nums) {
        Set<Integer> seen = new HashSet<>();
        for (int num : nums) {
            if (seen.contains(num)) {
                return num;
            }
            seen.add(num);
        }
        return -1; // Should never reach here
    }
}

Complexity Analysis:

  • Time Complexity: O(n) - Single pass through array
  • Space Complexity: O(n) - Violates the O(1) space constraint

Trade-offs:

  • Simple and intuitive
  • Violates space constraint
  • Good for understanding the problem
  • Not suitable for interview constraints

Key Insights

  • Cycle Detection: The array can be viewed as a linked list where nums[i] points to nums[nums[i]]
  • Pigeonhole Principle: With n+1 numbers in range [1,n], at least one number must be duplicated
  • Binary Search on Answer: When we know the answer range, binary search can be applied
  • Constraint Awareness: The O(1) space constraint eliminates hash-based solutions

Edge Cases

  1. Single Duplicate: [1,1] → Returns 1
  2. Duplicate at End: [1,2,3,4,4] → Returns 4
  3. Duplicate at Beginning: [2,2,3,4,5] → Returns 2
  4. Multiple Occurrences: [1,3,4,2,2] → Returns 2
  5. Large Array: Arrays with maximum constraints

How Solutions Handle Edge Cases:

  • All approaches handle single duplicates correctly
  • Floyd’s algorithm works regardless of duplicate position
  • Binary search approach is robust for all cases
  • Hash set approach is simple but violates constraints

Test Cases

def test_findDuplicate():
    # Basic case
    assert findDuplicate([1,3,4,2,2]) == 2
    
    # Duplicate at different positions
    assert findDuplicate([3,1,3,4,2]) == 3
    
    # Single duplicate
    assert findDuplicate([1,1]) == 1
    
    # Duplicate at end
    assert findDuplicate([1,2,3,4,4]) == 4
    
    # Duplicate at beginning
    assert findDuplicate([2,2,3,4,5]) == 2
    
    # Large array
    large_nums = list(range(1, 1000)) + [500]
    assert findDuplicate(large_nums) == 500
    
    print("All tests passed!")

test_findDuplicate()

Follow-up Questions

  1. Multiple Duplicates: What if there are multiple duplicate numbers?
  2. Modify Array: What if you were allowed to modify the array?
  3. Find All Duplicates: How would you find all duplicate numbers?
  4. Missing and Duplicate: What if there’s both a missing and duplicate number?
  5. Streaming Data: How would you handle this in a stream of data?

Common Mistakes

  1. Violating Space Constraint: Using hash set or array to track seen numbers

    • Problem: Uses O(n) extra space
    • Solution: Use Floyd’s algorithm or binary search
  2. Modifying Array: Sorting or marking elements as negative

    • Problem: Violates “read-only” constraint
    • Solution: Use algorithms that don’t modify input
  3. Wrong Binary Search Range: Using [0, n] instead of [1, n]

    • Problem: Incorrect range for binary search
    • Solution: Remember values are in range [1, n]
  4. Infinite Loop in Floyd’s: Not handling the cycle detection correctly

    • Problem: Algorithm doesn’t terminate
    • Solution: Ensure proper phase separation

Interview Tips

  1. Start with Constraints: Always clarify the constraints first
  2. Discuss Approaches: Mention hash set first, then explain why it doesn’t work
  3. Explain Floyd’s Algorithm: Draw the linked list representation
  4. Binary Search Alternative: Show the binary search approach as well
  5. Edge Cases: Discuss how each approach handles edge cases

Concept Explanations

Cycle Detection in Arrays: This problem demonstrates how arrays can be treated as linked lists for cycle detection. The key insight is that nums[i] represents a pointer to nums[nums[i]].

Binary Search on Answer: When we know the answer lies in a specific range, we can use binary search to find it efficiently, even if the problem doesn’t seem to involve searching.

Pigeonhole Principle: With n+1 numbers in range [1,n], the pigeonhole principle guarantees at least one duplicate exists, which is fundamental to understanding why this problem always has a solution.