🎉 Exercism Research is now launched. Help Exercism, help science and have some fun at research.exercism.io 🎉
Avatar of nicolemon

nicolemon's solution

to Run Length Encoding in the Python Track

Published at Jul 13 2018 · 0 comments
Instructions
Test suite
Solution

Note:

This solution was written on an old version of Exercism. The tests below might not correspond to the solution code, and the exercise may have changed since this code was written.

Implement run-length encoding and decoding.

Run-length encoding (RLE) is a simple form of data compression, where runs (consecutive data elements) are replaced by just one data value and count.

For example we can represent the original 53 characters with only 13.

"WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB"  ->  "12WB12W3B24WB"

RLE allows the original data to be perfectly reconstructed from the compressed data, which makes it a lossless data compression.

"AABCCCDEEEE"  ->  "2AB3CD4E"  ->  "AABCCCDEEEE"

For simplicity, you can assume that the unencoded string will only contain the letters A through Z (either lower or upper case) and whitespace. This way data to be encoded will never contain any numbers and numbers inside data to be decoded always represent the count for the following character.

Exception messages

Sometimes it is necessary to raise an exception. When you do this, you should include a meaningful error message to indicate what the source of the error is. This makes your code more readable and helps significantly with debugging. Not every exercise will require you to raise an exception, but for those that do, the tests will only pass if you include a message.

To raise a message with an exception, just write it as an argument to the exception type. For example, instead of raise Exception, you should write:

raise Exception("Meaningful message indicating the source of the error")

Running the tests

To run the tests, run the appropriate command below (why they are different):

  • Python 2.7: py.test run_length_encoding_test.py
  • Python 3.4+: pytest run_length_encoding_test.py

Alternatively, you can tell Python to run the pytest module (allowing the same command to be used regardless of Python version): python -m pytest run_length_encoding_test.py

Common pytest options

  • -v : enable verbose output
  • -x : stop running tests on first failure
  • --ff : run failures from previous test before running other test cases

For other options, see python -m pytest -h

Submitting Exercises

Note that, when trying to submit an exercise, make sure the solution is in the $EXERCISM_WORKSPACE/python/run-length-encoding directory.

You can find your Exercism workspace by running exercism debug and looking for the line that starts with Workspace.

For more detailed information about running tests, code style and linting, please see the help page.

Source

Wikipedia https://en.wikipedia.org/wiki/Run-length_encoding

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

run_length_encoding_test.py

import unittest

from run_length_encoding import encode, decode


# Tests adapted from `problem-specifications//canonical-data.json` @ v1.1.0

class RunLengthEncodingTest(unittest.TestCase):
    def test_encode_empty_string(self):
        self.assertMultiLineEqual(encode(''), '')

    def test_encode_single_characters_only_are_encoded_without_count(self):
        self.assertMultiLineEqual(encode('XYZ'), 'XYZ')

    def test_encode_string_with_no_single_characters(self):
        self.assertMultiLineEqual(encode('AABBBCCCC'), '2A3B4C')

    def test_encode_single_characters_mixed_with_repeated_characters(self):
        self.assertMultiLineEqual(
            encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB'),
            '12WB12W3B24WB')

    def test_encode_multiple_whitespace_mixed_in_string(self):
        self.assertMultiLineEqual(encode('  hsqq qww  '), '2 hs2q q2w2 ')

    def test_encode_lowercase_characters(self):
        self.assertMultiLineEqual(encode('aabbbcccc'), '2a3b4c')

    def test_decode_empty_string(self):
        self.assertMultiLineEqual(decode(''), '')

    def test_decode_single_characters_only(self):
        self.assertMultiLineEqual(decode('XYZ'), 'XYZ')

    def test_decode_string_with_no_single_characters(self):
        self.assertMultiLineEqual(decode('2A3B4C'), 'AABBBCCCC')

    def test_decode_single_characters_with_repeated_characters(self):
        self.assertMultiLineEqual(
            decode('12WB12W3B24WB'),
            'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB')

    def test_decode_multiple_whitespace_mixed_in_string(self):
        self.assertMultiLineEqual(decode('2 hs2q q2w2 '), '  hsqq qww  ')

    def test_decode_lower_case_string(self):
        self.assertMultiLineEqual(decode('2a3b4c'), 'aabbbcccc')

    def test_combination(self):
        self.assertMultiLineEqual(decode(encode('zzz ZZ  zZ')), 'zzz ZZ  zZ')


if __name__ == '__main__':
    unittest.main()
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re


ENCODED = re.compile(r'(\d*)([A-Za-z\s])')
GROUP = re.compile(r'([A-Za-z ])\1*')


def is_encoded(string):
    return ENCODED.fullmatch(string)


def decode(string):
    decoded = list()
    encoded_groups = ENCODED.findall(string)
    for n, c in encoded_groups:
        if n != '':
            decoded.extend(int(n) * ('%s' % c))
        else:
            decoded.extend(('%s' % c))
    return ''.join(decoded)


def encode(string):  # optimize this hack thanks
    encoded = list()
    while len(string) > 0:
        match = GROUP.match(string)
        print(match)
        char = match.group()
        if len(char) > 1:
            encoded_group = '{}{}'.format(len(char), char[0])
        else:
            encoded_group = char[0]
        encoded.append(encoded_group)
        string = string[len(char):]
    return ''.join(encoded)

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?