Language Selection
Choose your preferred programming language
Find Duplicate Number
Problem Statement
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
Input/Output Specifications
- Input: An array
numswhere1 <= nums.length <= 3 * 10^4and1 <= nums[i] <= nums.length - 1 - Output: The duplicate number
Constraints
- You must not modify the array (assume the array is read only)
- You must use only constant, O(1) extra space
- Your runtime complexity should be less than O(n²)
- There is only one duplicate number in the array, but it could be repeated more than once
Examples
Example 1:
Input: nums = [1,3,4,2,2]
Output: 2
Explanation: The duplicate number is 2.
Example 2:
Input: nums = [3,1,3,4,2]
Output: 3
Explanation: The duplicate number is 3.
Example 3:
Input: nums = [1,1]
Output: 1
Explanation: The duplicate number is 1.
Solution Approaches
Approach 1: Floyd’s Cycle Detection Algorithm (Optimal)
Algorithm Explanation: This problem can be solved using Floyd’s cycle detection algorithm by treating the array as a linked list where each index points to the value at that index. Since there’s a duplicate, there will be a cycle in this “linked list”.
- Use two pointers: slow and fast
- Move slow pointer one step at a time, fast pointer two steps at a time
- When they meet, reset one pointer to the beginning
- Move both pointers one step at a time until they meet again
- The meeting point is the duplicate number
Implementation:
Python:
def findDuplicate(nums):
"""
Find duplicate using Floyd's cycle detection algorithm
Time: O(n)
Space: O(1)
"""
# Phase 1: Find the intersection point in the cycle
slow = fast = nums[0]
# Move slow one step, fast two steps
while True:
slow = nums[slow]
fast = nums[nums[fast]]
if slow == fast:
break
# Phase 2: Find the entrance to the cycle
slow = nums[0]
while slow != fast:
slow = nums[slow]
fast = nums[fast]
return slow
Java:
class Solution {
/**
* Find duplicate using Floyd's cycle detection algorithm
* Time: O(n)
* Space: O(1)
*/
public int findDuplicate(int[] nums) {
// Phase 1: Find the intersection point in the cycle
int slow = nums[0];
int fast = nums[0];
// Move slow one step, fast two steps
do {
slow = nums[slow];
fast = nums[nums[fast]];
} while (slow != fast);
// Phase 2: Find the entrance to the cycle
slow = nums[0];
while (slow != fast) {
slow = nums[slow];
fast = nums[fast];
}
return slow;
}
}
Go:
// findDuplicate - Find duplicate using Floyd's cycle detection algorithm
// Time: O(n)
// Space: O(1)
func findDuplicate(nums []int) int {
// Phase 1: Find the intersection point in the cycle
slow := nums[0]
fast := nums[0]
// Move slow one step, fast two steps
for {
slow = nums[slow]
fast = nums[nums[fast]]
if slow == fast {
break
}
}
// Phase 2: Find the entrance to the cycle
slow = nums[0]
for slow != fast {
slow = nums[slow]
fast = nums[fast]
}
return slow
}
JavaScript:
/**
* Find duplicate using Floyd's cycle detection algorithm
* Time: O(n)
* Space: O(1)
*/
function findDuplicate(nums) {
// Phase 1: Find the intersection point in the cycle
let slow = nums[0];
let fast = nums[0];
// Move slow one step, fast two steps
do {
slow = nums[slow];
fast = nums[nums[fast]];
} while (slow !== fast);
// Phase 2: Find the entrance to the cycle
slow = nums[0];
while (slow !== fast) {
slow = nums[slow];
fast = nums[fast];
}
return slow;
}
C#:
public class Solution {
/// <summary>
/// Find duplicate using Floyd's cycle detection algorithm
/// Time: O(n)
/// Space: O(1)
/// </summary>
public int FindDuplicate(int[] nums) {
// Phase 1: Find the intersection point in the cycle
int slow = nums[0];
int fast = nums[0];
// Move slow one step, fast two steps
do {
slow = nums[slow];
fast = nums[nums[fast]];
} while (slow != fast);
// Phase 2: Find the entrance to the cycle
slow = nums[0];
while (slow != fast) {
slow = nums[slow];
fast = nums[fast];
}
return slow;
}
}
Complexity Analysis:
- Time Complexity: O(n) - Each element is visited at most twice
- Space Complexity: O(1) - Only using constant extra space
Trade-offs:
- Optimal time and space complexity
- Clever algorithm that treats array as linked list
- Requires understanding of cycle detection
- Not immediately intuitive
Approach 2: Binary Search on Answer
Algorithm Explanation: Since we know the answer is between 1 and n, we can use binary search to find the duplicate number by counting how many numbers are less than or equal to the middle value.
- Binary search on the range [1, n]
- For each mid value, count how many numbers in the array are <= mid
- If count > mid, the duplicate is in the left half, otherwise right half
- Continue until we find the duplicate
Implementation:
Python:
def findDuplicateBinarySearch(nums):
"""
Find duplicate using binary search on answer
Time: O(n log n)
Space: O(1)
"""
left, right = 1, len(nums) - 1
while left < right:
mid = (left + right) // 2
# Count numbers <= mid
count = 0
for num in nums:
if num <= mid:
count += 1
# If count > mid, duplicate is in left half
if count > mid:
right = mid
else:
left = mid + 1
return left
Java:
class Solution {
/**
* Find duplicate using binary search on answer
* Time: O(n log n)
* Space: O(1)
*/
public int findDuplicateBinarySearch(int[] nums) {
int left = 1, right = nums.length - 1;
while (left < right) {
int mid = left + (right - left) / 2;
// Count numbers <= mid
int count = 0;
for (int num : nums) {
if (num <= mid) {
count++;
}
}
// If count > mid, duplicate is in left half
if (count > mid) {
right = mid;
} else {
left = mid + 1;
}
}
return left;
}
}
Go:
// findDuplicateBinarySearch - Find duplicate using binary search on answer
// Time: O(n log n)
// Space: O(1)
func findDuplicateBinarySearch(nums []int) int {
left, right := 1, len(nums)-1
for left < right {
mid := left + (right-left)/2
// Count numbers <= mid
count := 0
for _, num := range nums {
if num <= mid {
count++
}
}
// If count > mid, duplicate is in left half
if count > mid {
right = mid
} else {
left = mid + 1
}
}
return left
}
JavaScript:
/**
* Find duplicate using binary search on answer
* Time: O(n log n)
* Space: O(1)
*/
function findDuplicateBinarySearch(nums) {
let left = 1, right = nums.length - 1;
while (left < right) {
const mid = Math.floor((left + right) / 2);
// Count numbers <= mid
let count = 0;
for (const num of nums) {
if (num <= mid) {
count++;
}
}
// If count > mid, duplicate is in left half
if (count > mid) {
right = mid;
} else {
left = mid + 1;
}
}
return left;
}
C#:
public class Solution {
/// <summary>
/// Find duplicate using binary search on answer
/// Time: O(n log n)
/// Space: O(1)
/// </summary>
public int FindDuplicateBinarySearch(int[] nums) {
int left = 1, right = nums.Length - 1;
while (left < right) {
int mid = left + (right - left) / 2;
// Count numbers <= mid
int count = 0;
foreach (int num in nums) {
if (num <= mid) {
count++;
}
}
// If count > mid, duplicate is in left half
if (count > mid) {
right = mid;
} else {
left = mid + 1;
}
}
return left;
}
}
Complexity Analysis:
- Time Complexity: O(n log n) - Binary search takes O(log n) iterations, each counting takes O(n)
- Space Complexity: O(1) - Only using constant extra space
Trade-offs:
- Simpler to understand than Floyd’s algorithm
- Higher time complexity
- Still maintains O(1) space complexity
- More intuitive approach
Approach 3: Hash Set (Not Optimal for Constraints)
Algorithm Explanation: Use a hash set to track seen numbers. When we encounter a number that’s already in the set, it’s the duplicate.
Implementation:
Python:
def findDuplicateHashSet(nums):
"""
Find duplicate using hash set
Time: O(n)
Space: O(n) - violates space constraint
"""
seen = set()
for num in nums:
if num in seen:
return num
seen.add(num)
return -1 # Should never reach here
Java:
class Solution {
/**
* Find duplicate using hash set
* Time: O(n)
* Space: O(n) - violates space constraint
*/
public int findDuplicateHashSet(int[] nums) {
Set<Integer> seen = new HashSet<>();
for (int num : nums) {
if (seen.contains(num)) {
return num;
}
seen.add(num);
}
return -1; // Should never reach here
}
}
Complexity Analysis:
- Time Complexity: O(n) - Single pass through array
- Space Complexity: O(n) - Violates the O(1) space constraint
Trade-offs:
- Simple and intuitive
- Violates space constraint
- Good for understanding the problem
- Not suitable for interview constraints
Key Insights
- Cycle Detection: The array can be viewed as a linked list where nums[i] points to nums[nums[i]]
- Pigeonhole Principle: With n+1 numbers in range [1,n], at least one number must be duplicated
- Binary Search on Answer: When we know the answer range, binary search can be applied
- Constraint Awareness: The O(1) space constraint eliminates hash-based solutions
Edge Cases
- Single Duplicate:
[1,1]→ Returns1 - Duplicate at End:
[1,2,3,4,4]→ Returns4 - Duplicate at Beginning:
[2,2,3,4,5]→ Returns2 - Multiple Occurrences:
[1,3,4,2,2]→ Returns2 - Large Array: Arrays with maximum constraints
How Solutions Handle Edge Cases:
- All approaches handle single duplicates correctly
- Floyd’s algorithm works regardless of duplicate position
- Binary search approach is robust for all cases
- Hash set approach is simple but violates constraints
Test Cases
def test_findDuplicate():
# Basic case
assert findDuplicate([1,3,4,2,2]) == 2
# Duplicate at different positions
assert findDuplicate([3,1,3,4,2]) == 3
# Single duplicate
assert findDuplicate([1,1]) == 1
# Duplicate at end
assert findDuplicate([1,2,3,4,4]) == 4
# Duplicate at beginning
assert findDuplicate([2,2,3,4,5]) == 2
# Large array
large_nums = list(range(1, 1000)) + [500]
assert findDuplicate(large_nums) == 500
print("All tests passed!")
test_findDuplicate()
Follow-up Questions
- Multiple Duplicates: What if there are multiple duplicate numbers?
- Modify Array: What if you were allowed to modify the array?
- Find All Duplicates: How would you find all duplicate numbers?
- Missing and Duplicate: What if there’s both a missing and duplicate number?
- Streaming Data: How would you handle this in a stream of data?
Common Mistakes
Violating Space Constraint: Using hash set or array to track seen numbers
- Problem: Uses O(n) extra space
- Solution: Use Floyd’s algorithm or binary search
Modifying Array: Sorting or marking elements as negative
- Problem: Violates “read-only” constraint
- Solution: Use algorithms that don’t modify input
Wrong Binary Search Range: Using [0, n] instead of [1, n]
- Problem: Incorrect range for binary search
- Solution: Remember values are in range [1, n]
Infinite Loop in Floyd’s: Not handling the cycle detection correctly
- Problem: Algorithm doesn’t terminate
- Solution: Ensure proper phase separation
Interview Tips
- Start with Constraints: Always clarify the constraints first
- Discuss Approaches: Mention hash set first, then explain why it doesn’t work
- Explain Floyd’s Algorithm: Draw the linked list representation
- Binary Search Alternative: Show the binary search approach as well
- Edge Cases: Discuss how each approach handles edge cases
Concept Explanations
Cycle Detection in Arrays: This problem demonstrates how arrays can be treated as linked lists for cycle detection. The key insight is that nums[i] represents a pointer to nums[nums[i]].
Binary Search on Answer: When we know the answer lies in a specific range, we can use binary search to find it efficiently, even if the problem doesn’t seem to involve searching.
Pigeonhole Principle: With n+1 numbers in range [1,n], the pigeonhole principle guarantees at least one duplicate exists, which is fundamental to understanding why this problem always has a solution.