Language Selection
Choose your preferred programming language
String Compression
Problem Statement
Given an array of characters chars, compress it using the following algorithm:
Begin with an empty string s. For each group of consecutive repeating characters in chars:
- If the group’s length is 1, append the character to
s. - Otherwise, append the character followed by the group’s length.
The compressed string s should not be returned separately, but instead be stored in the input character array chars. Note that group lengths that are 10 or longer will be split into multiple characters in chars.
After you are done modifying the input array, return the new length of the array.
Examples
Example 1:
Input: chars = ["a","a","b","b","c","c","c"]
Output: Return 6, and the first 6 characters of the input array should be: ["a","2","b","2","c","3"]
Explanation: The groups are "aa", "bb", and "ccc". This compresses to "a2b2c3".
Example 2:
Input: chars = ["a"]
Output: Return 1, and the first character of the input array should be: ["a"]
Explanation: The only group is "a", which remains uncompressed since it's a single character.
Example 3:
Input: chars = ["a","b","b","b","b","b","b","b","b","b","b","b","b"]
Output: Return 4, and the first 4 characters of the input array should be: ["a","b","1","2"]
Explanation: The groups are "a" and "bbbbbbbbbbbb". This compresses to "ab12".
Constraints
1 <= chars.length <= 2000chars[i]is a lowercase English letter, uppercase English letter, digit, or symbol.
Approach 1: Two Pointers (Optimal)
Algorithm
Use two pointers: one to read characters and one to write the compressed result.
Steps:
- Initialize read pointer at 0 and write pointer at 0
- For each group of consecutive characters:
- Count the length of the group
- Write the character at write position
- If count > 1, write the count as individual digits
- Move write pointer to next position
- Return the new length (write pointer position)
Implementation
Python:
def compress(chars):
"""
Compress string in-place using two pointers
Time: O(n)
Space: O(1)
"""
if not chars:
return 0
write = 0
read = 0
while read < len(chars):
char = chars[read]
count = 0
# Count consecutive characters
while read < len(chars) and chars[read] == char:
read += 1
count += 1
# Write character
chars[write] = char
write += 1
# Write count if > 1
if count > 1:
for digit in str(count):
chars[write] = digit
write += 1
return write
Java:
class Solution {
/**
* Compress string in-place using two pointers
* Time: O(n)
* Space: O(1)
*/
public int compress(char[] chars) {
if (chars.length == 0) {
return 0;
}
int write = 0;
int read = 0;
while (read < chars.length) {
char currentChar = chars[read];
int count = 0;
// Count consecutive characters
while (read < chars.length && chars[read] == currentChar) {
read++;
count++;
}
// Write character
chars[write] = currentChar;
write++;
// Write count if > 1
if (count > 1) {
String countStr = String.valueOf(count);
for (char digit : countStr.toCharArray()) {
chars[write] = digit;
write++;
}
}
}
return write;
}
}
Go:
func compress(chars []byte) int {
/**
* Compress string in-place using two pointers
* Time: O(n)
* Space: O(1)
*/
if len(chars) == 0 {
return 0
}
write := 0
read := 0
for read < len(chars) {
char := chars[read]
count := 0
// Count consecutive characters
for read < len(chars) && chars[read] == char {
read++
count++
}
// Write character
chars[write] = char
write++
// Write count if > 1
if count > 1 {
countStr := strconv.Itoa(count)
for _, digit := range countStr {
chars[write] = byte(digit)
write++
}
}
}
return write
}
JavaScript:
/**
* Compress string in-place using two pointers
* Time: O(n)
* Space: O(1)
*/
function compress(chars) {
if (chars.length === 0) {
return 0;
}
let write = 0;
let read = 0;
while (read < chars.length) {
const char = chars[read];
let count = 0;
// Count consecutive characters
while (read < chars.length && chars[read] === char) {
read++;
count++;
}
// Write character
chars[write] = char;
write++;
// Write count if > 1
if (count > 1) {
const countStr = count.toString();
for (const digit of countStr) {
chars[write] = digit;
write++;
}
}
}
return write;
}
C#:
public class Solution {
/// <summary>
/// Compress string in-place using two pointers
/// Time: O(n)
/// Space: O(1)
/// </summary>
public int Compress(char[] chars) {
if (chars.Length == 0) {
return 0;
}
int write = 0;
int read = 0;
while (read < chars.Length) {
char currentChar = chars[read];
int count = 0;
// Count consecutive characters
while (read < chars.Length && chars[read] == currentChar) {
read++;
count++;
}
// Write character
chars[write] = currentChar;
write++;
// Write count if > 1
if (count > 1) {
string countStr = count.ToString();
foreach (char digit in countStr) {
chars[write] = digit;
write++;
}
}
}
return write;
}
}
Complexity Analysis
- Time Complexity: O(n) - Single pass through array
- Space Complexity: O(1) - Only using constant extra space
Key Insights
- Two Pointers: Use read and write pointers for in-place compression
- Group Counting: Count consecutive characters efficiently
- Digit Conversion: Convert count to individual digit characters
- In-Place Modification: Modify the original array without extra space
- Length Return: Return the new compressed length
Edge Cases
- Empty Array: Return 0
- Single Character: Return 1, no compression needed
- All Same Characters: Compress to single character with count
- No Consecutive Characters: Each character remains as-is
- Large Counts: Handle counts >= 10 (multiple digits)
Test Cases
def test_compress():
# Test case 1
chars1 = ["a","a","b","b","c","c","c"]
result1 = compress(chars1)
assert result1 == 6
assert chars1[:6] == ["a","2","b","2","c","3"]
# Test case 2
chars2 = ["a"]
result2 = compress(chars2)
assert result2 == 1
assert chars2[:1] == ["a"]
# Test case 3
chars3 = ["a","b","b","b","b","b","b","b","b","b","b","b","b"]
result3 = compress(chars3)
assert result3 == 4
assert chars3[:4] == ["a","b","1","2"]
# Edge cases
chars4 = ["a","b","c"]
result4 = compress(chars4)
assert result4 == 3
assert chars4[:3] == ["a","b","c"]
chars5 = ["a","a","a","a","a","a","a","a","a","a"]
result5 = compress(chars5)
assert result5 == 3
assert chars5[:3] == ["a","1","0"]
print("All tests passed!")
test_compress()
Follow-up Questions
- Decompress String: Given compressed string, decompress it?
- Compress with Different Format: Use different compression format?
- Compress with Lossy: Allow some loss for better compression?
- Compress Multiple Strings: Handle multiple strings efficiently?
- Compress with Dictionary: Use dictionary-based compression?
Common Mistakes
- Not Handling Large Counts: Forgetting that counts >= 10 need multiple digits
- Wrong Write Position: Not updating write pointer correctly
- Not Returning Length: Forgetting to return the new array length
- In-Place Confusion: Not understanding that we modify the original array
- Edge Case Handling: Not handling empty or single character arrays
Interview Tips
- Start with Two Pointers: Most efficient approach
- Explain In-Place: Emphasize that we modify the original array
- Handle Large Counts: Mention that counts >= 10 need multiple digits
- Discuss Trade-offs: Compare different approaches
- Follow-up Ready: Be prepared for decompression and variations
Concept Explanations
Two Pointers Technique: We use a read pointer to scan through the original array and a write pointer to write the compressed result. This allows us to compress the string in-place without using additional space.
Group Identification: We identify groups of consecutive identical characters by scanning through the array and counting how many times each character appears consecutively.
Digit Conversion: When the count is greater than 1, we need to convert the count to individual digit characters. For example, count 12 becomes “1” and “2”.
In-Place Compression: Instead of creating a new array, we modify the original array by overwriting it with the compressed result. This saves space but requires careful pointer management.
Length Return: The function returns the new length of the compressed array, which tells the caller how many characters in the original array are part of the compressed result.
Why This Works: The two-pointers approach ensures that we never overwrite data that we haven’t read yet, making it safe to compress in-place. The read pointer always stays ahead of or equal to the write pointer, preventing data loss.