Implement run-length encoding and decoding.
Run-length encoding (RLE) is a simple form of data compression, where runs (consecutive data elements) are replaced by just one data value and count.
For example we can represent the original 53 characters with only 13.
"WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB" -> "12WB12W3B24WB"
RLE allows the original data to be perfectly reconstructed from the compressed data, which makes it a lossless data compression.
"AABCCCDEEEE" -> "2AB3CD4E" -> "AABCCCDEEEE"
For simplicity, you can assume that the unencoded string will only contain the letters A through Z (either lower or upper case) and whitespace. This way data to be encoded will never contain any numbers and numbers inside data to be decoded always represent the count for the following character.
Sometimes it is necessary to raise an exception. When you do this, you should include a meaningful error message to indicate what the source of the error is. This makes your code more readable and helps significantly with debugging. Not every exercise will require you to raise an exception, but for those that do, the tests will only pass if you include a message.
To raise a message with an exception, just write it as an argument to the exception type. For example, instead of
raise Exception
, you should write:
raise Exception("Meaningful message indicating the source of the error")
To run the tests, run the appropriate command below (why they are different):
py.test run_length_encoding_test.py
pytest run_length_encoding_test.py
Alternatively, you can tell Python to run the pytest module (allowing the same command to be used regardless of Python version):
python -m pytest run_length_encoding_test.py
pytest
options-v
: enable verbose output-x
: stop running tests on first failure--ff
: run failures from previous test before running other test casesFor other options, see python -m pytest -h
Note that, when trying to submit an exercise, make sure the solution is in the $EXERCISM_WORKSPACE/python/run-length-encoding
directory.
You can find your Exercism workspace by running exercism debug
and looking for the line that starts with Workspace
.
For more detailed information about running tests, code style and linting, please see the help page.
Wikipedia https://en.wikipedia.org/wiki/Run-length_encoding
It's possible to submit an incomplete solution so you can see how others have completed the exercise.
import unittest
from run_length_encoding import encode, decode
# Tests adapted from `problem-specifications//canonical-data.json` @ v1.1.0
class RunLengthEncodingTest(unittest.TestCase):
def test_encode_empty_string(self):
self.assertMultiLineEqual(encode(''), '')
def test_encode_single_characters_only_are_encoded_without_count(self):
self.assertMultiLineEqual(encode('XYZ'), 'XYZ')
def test_encode_string_with_no_single_characters(self):
self.assertMultiLineEqual(encode('AABBBCCCC'), '2A3B4C')
def test_encode_single_characters_mixed_with_repeated_characters(self):
self.assertMultiLineEqual(
encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB'),
'12WB12W3B24WB')
def test_encode_multiple_whitespace_mixed_in_string(self):
self.assertMultiLineEqual(encode(' hsqq qww '), '2 hs2q q2w2 ')
def test_encode_lowercase_characters(self):
self.assertMultiLineEqual(encode('aabbbcccc'), '2a3b4c')
def test_decode_empty_string(self):
self.assertMultiLineEqual(decode(''), '')
def test_decode_single_characters_only(self):
self.assertMultiLineEqual(decode('XYZ'), 'XYZ')
def test_decode_string_with_no_single_characters(self):
self.assertMultiLineEqual(decode('2A3B4C'), 'AABBBCCCC')
def test_decode_single_characters_with_repeated_characters(self):
self.assertMultiLineEqual(
decode('12WB12W3B24WB'),
'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB')
def test_decode_multiple_whitespace_mixed_in_string(self):
self.assertMultiLineEqual(decode('2 hs2q q2w2 '), ' hsqq qww ')
def test_decode_lower_case_string(self):
self.assertMultiLineEqual(decode('2a3b4c'), 'aabbbcccc')
def test_combination(self):
self.assertMultiLineEqual(decode(encode('zzz ZZ zZ')), 'zzz ZZ zZ')
if __name__ == '__main__':
unittest.main()
import re
def decode(string):
decoded = ''
matches = re.findall("(\d*)([a-zA-Z ])", string)
for m in matches:
if m[0]:
snippet = int(m[0]) * m[1]
else:
snippet = m[1]
decoded += snippet
return decoded
def encode(string):
count = 1
last = ""
encoded = ""
for char in string:
if char == last:
count += 1
else:
encoded += re.sub('^1$', '', str(count)) + last
last = char
count = 1
encoded += re.sub('^1$', '', str(count)) + last
return encoded
A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.
Here are some questions to help you reflect on this solution and learn the most from it.
Level up your programming skills with 3,450 exercises across 52 languages, and insightful discussion with our volunteer team of welcoming mentors. Exercism is 100% free forever.
Sign up Learn More
Community comments