String Compression

Compress a string by replacing consecutive characters with character and count.

Language Selection

Choose your preferred programming language

Showing: Python

String Compression

Problem Statement

Given an array of characters chars, compress it using the following algorithm:

Begin with an empty string s. For each group of consecutive repeating characters in chars:

If the group’s length is 1, append the character to s.
Otherwise, append the character followed by the group’s length.

The compressed string s should not be returned separately, but instead be stored in the input character array chars. Note that group lengths that are 10 or longer will be split into multiple characters in chars.

After you are done modifying the input array, return the new length of the array.

Examples

Example 1:

Input: chars = ["a","a","b","b","c","c","c"]
Output: Return 6, and the first 6 characters of the input array should be: ["a","2","b","2","c","3"]
Explanation: The groups are "aa", "bb", and "ccc". This compresses to "a2b2c3".

Example 2:

Input: chars = ["a"]
Output: Return 1, and the first character of the input array should be: ["a"]
Explanation: The only group is "a", which remains uncompressed since it's a single character.

Example 3:

Input: chars = ["a","b","b","b","b","b","b","b","b","b","b","b","b"]
Output: Return 4, and the first 4 characters of the input array should be: ["a","b","1","2"]
Explanation: The groups are "a" and "bbbbbbbbbbbb". This compresses to "ab12".

Constraints

1 <= chars.length <= 2000
chars[i] is a lowercase English letter, uppercase English letter, digit, or symbol.

Approach 1: Two Pointers (Optimal)

Algorithm

Use two pointers: one to read characters and one to write the compressed result.

Steps:

Initialize read pointer at 0 and write pointer at 0
For each group of consecutive characters:
- Count the length of the group
- Write the character at write position
- If count > 1, write the count as individual digits
- Move write pointer to next position
Return the new length (write pointer position)

Implementation

Python:

def compress(chars):
    """
    Compress string in-place using two pointers
    Time: O(n)
    Space: O(1)
    """
    if not chars:
        return 0
    
    write = 0
    read = 0
    
    while read < len(chars):
        char = chars[read]
        count = 0
        
        # Count consecutive characters
        while read < len(chars) and chars[read] == char:
            read += 1
            count += 1
        
        # Write character
        chars[write] = char
        write += 1
        
        # Write count if > 1
        if count > 1:
            for digit in str(count):
                chars[write] = digit
                write += 1
    
    return write

Java:

class Solution {
    /**
     * Compress string in-place using two pointers
     * Time: O(n)
     * Space: O(1)
     */
    public int compress(char[] chars) {
        if (chars.length == 0) {
            return 0;
        }
        
        int write = 0;
        int read = 0;
        
        while (read < chars.length) {
            char currentChar = chars[read];
            int count = 0;
            
            // Count consecutive characters
            while (read < chars.length && chars[read] == currentChar) {
                read++;
                count++;
            }
            
            // Write character
            chars[write] = currentChar;
            write++;
            
            // Write count if > 1
            if (count > 1) {
                String countStr = String.valueOf(count);
                for (char digit : countStr.toCharArray()) {
                    chars[write] = digit;
                    write++;
                }
            }
        }
        
        return write;
    }
}

Go:

func compress(chars []byte) int {
    /**
     * Compress string in-place using two pointers
     * Time: O(n)
     * Space: O(1)
     */
    if len(chars) == 0 {
        return 0
    }
    
    write := 0
    read := 0
    
    for read < len(chars) {
        char := chars[read]
        count := 0
        
        // Count consecutive characters
        for read < len(chars) && chars[read] == char {
            read++
            count++
        }
        
        // Write character
        chars[write] = char
        write++
        
        // Write count if > 1
        if count > 1 {
            countStr := strconv.Itoa(count)
            for _, digit := range countStr {
                chars[write] = byte(digit)
                write++
            }
        }
    }
    
    return write
}

JavaScript:

/**
 * Compress string in-place using two pointers
 * Time: O(n)
 * Space: O(1)
 */
function compress(chars) {
    if (chars.length === 0) {
        return 0;
    }
    
    let write = 0;
    let read = 0;
    
    while (read < chars.length) {
        const char = chars[read];
        let count = 0;
        
        // Count consecutive characters
        while (read < chars.length && chars[read] === char) {
            read++;
            count++;
        }
        
        // Write character
        chars[write] = char;
        write++;
        
        // Write count if > 1
        if (count > 1) {
            const countStr = count.toString();
            for (const digit of countStr) {
                chars[write] = digit;
                write++;
            }
        }
    }
    
    return write;
}

C#:

public class Solution {
    /// <summary>
    /// Compress string in-place using two pointers
    /// Time: O(n)
    /// Space: O(1)
    /// </summary>
    public int Compress(char[] chars) {
        if (chars.Length == 0) {
            return 0;
        }
        
        int write = 0;
        int read = 0;
        
        while (read < chars.Length) {
            char currentChar = chars[read];
            int count = 0;
            
            // Count consecutive characters
            while (read < chars.Length && chars[read] == currentChar) {
                read++;
                count++;
            }
            
            // Write character
            chars[write] = currentChar;
            write++;
            
            // Write count if > 1
            if (count > 1) {
                string countStr = count.ToString();
                foreach (char digit in countStr) {
                    chars[write] = digit;
                    write++;
                }
            }
        }
        
        return write;
    }
}

Complexity Analysis

Time Complexity: O(n) - Single pass through array
Space Complexity: O(1) - Only using constant extra space

Key Insights

Two Pointers: Use read and write pointers for in-place compression
Group Counting: Count consecutive characters efficiently
Digit Conversion: Convert count to individual digit characters
In-Place Modification: Modify the original array without extra space
Length Return: Return the new compressed length

Edge Cases

Empty Array: Return 0
Single Character: Return 1, no compression needed
All Same Characters: Compress to single character with count
No Consecutive Characters: Each character remains as-is
Large Counts: Handle counts >= 10 (multiple digits)

Test Cases

def test_compress():
    # Test case 1
    chars1 = ["a","a","b","b","c","c","c"]
    result1 = compress(chars1)
    assert result1 == 6
    assert chars1[:6] == ["a","2","b","2","c","3"]
    
    # Test case 2
    chars2 = ["a"]
    result2 = compress(chars2)
    assert result2 == 1
    assert chars2[:1] == ["a"]
    
    # Test case 3
    chars3 = ["a","b","b","b","b","b","b","b","b","b","b","b","b"]
    result3 = compress(chars3)
    assert result3 == 4
    assert chars3[:4] == ["a","b","1","2"]
    
    # Edge cases
    chars4 = ["a","b","c"]
    result4 = compress(chars4)
    assert result4 == 3
    assert chars4[:3] == ["a","b","c"]
    
    chars5 = ["a","a","a","a","a","a","a","a","a","a"]
    result5 = compress(chars5)
    assert result5 == 3
    assert chars5[:3] == ["a","1","0"]
    
    print("All tests passed!")

test_compress()

Follow-up Questions

Decompress String: Given compressed string, decompress it?
Compress with Different Format: Use different compression format?
Compress with Lossy: Allow some loss for better compression?
Compress Multiple Strings: Handle multiple strings efficiently?
Compress with Dictionary: Use dictionary-based compression?

Common Mistakes

Not Handling Large Counts: Forgetting that counts >= 10 need multiple digits
Wrong Write Position: Not updating write pointer correctly
Not Returning Length: Forgetting to return the new array length
In-Place Confusion: Not understanding that we modify the original array
Edge Case Handling: Not handling empty or single character arrays

Interview Tips

Start with Two Pointers: Most efficient approach
Explain In-Place: Emphasize that we modify the original array
Handle Large Counts: Mention that counts >= 10 need multiple digits
Discuss Trade-offs: Compare different approaches
Follow-up Ready: Be prepared for decompression and variations

Concept Explanations

Two Pointers Technique: We use a read pointer to scan through the original array and a write pointer to write the compressed result. This allows us to compress the string in-place without using additional space.

Group Identification: We identify groups of consecutive identical characters by scanning through the array and counting how many times each character appears consecutively.

Digit Conversion: When the count is greater than 1, we need to convert the count to individual digit characters. For example, count 12 becomes “1” and “2”.

In-Place Compression: Instead of creating a new array, we modify the original array by overwriting it with the compressed result. This saves space but requires careful pointer management.

Length Return: The function returns the new length of the compressed array, which tells the caller how many characters in the original array are part of the compressed result.

Why This Works: The two-pointers approach ensures that we never overwrite data that we haven’t read yet, making it safe to compress in-place. The read pointer always stays ahead of or equal to the write pointer, preventing data loss.