Avatar of w1zeman1p

w1zeman1p's solution

to Run Length Encoding in the Elixir Track

Published at Jul 13 2018 · 2 comments
Instructions
Test suite
Solution

Note:

This solution was written on an old version of Exercism. The tests below might not correspond to the solution code, and the exercise may have changed since this code was written.

Implement run-length encoding and decoding.

Run-length encoding (RLE) is a simple form of data compression, where runs (consecutive data elements) are replaced by just one data value and count.

For example we can represent the original 53 characters with only 13.

"WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB"  ->  "12WB12W3B24WB"

RLE allows the original data to be perfectly reconstructed from the compressed data, which makes it a lossless data compression.

"AABCCCDEEEE"  ->  "2AB3CD4E"  ->  "AABCCCDEEEE"

For simplicity, you can assume that the unencoded string will only contain the letters A through Z (either lower or upper case) and whitespace. This way data to be encoded will never contain any numbers and numbers inside data to be decoded always represent the count for the following character.

Running tests

Execute the tests with:

$ elixir run_length_encoding_test.exs

Pending tests

In the test suites, all but the first test have been skipped.

Once you get a test passing, you can unskip the next one by commenting out the relevant @tag :pending with a # symbol.

For example:

# @tag :pending
test "shouting" do
  assert Bob.hey("WATCH OUT!") == "Whoa, chill out!"
end

Or, you can enable all the tests by commenting out the ExUnit.configure line in the test suite.

# ExUnit.configure exclude: :pending, trace: true

For more detailed information about the Elixir track, please see the help page.

Source

Wikipedia https://en.wikipedia.org/wiki/Run-length_encoding

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

rle_test.exs

if !System.get_env("EXERCISM_TEST_EXAMPLES") do
  Code.load_file("rle.exs", __DIR__)
end

ExUnit.start()
ExUnit.configure(exclude: :pending, trace: true)

defmodule RunLengthEncoderTest do
  use ExUnit.Case

  test "encode empty string" do
    assert RunLengthEncoder.encode("") === ""
  end

  @tag :pending
  test "encode single characters only are encoded without count" do
    assert RunLengthEncoder.encode("XYZ") === "XYZ"
  end

  @tag :pending
  test "encode string with no single characters" do
    assert RunLengthEncoder.encode("AABBBCCCC") == "2A3B4C"
  end

  @tag :pending
  test "encode single characters mixed with repeated characters" do
    assert RunLengthEncoder.encode("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB") ===
             "12WB12W3B24WB"
  end

  @tag :pending
  test "encode multiple whitespace mixed in string" do
    assert RunLengthEncoder.encode("  hsqq qww  ") === "2 hs2q q2w2 "
  end

  @tag :pending
  test "encode lowercase characters" do
    assert RunLengthEncoder.encode("aabbbcccc") === "2a3b4c"
  end

  @tag :pending
  test "decode empty string" do
    assert RunLengthEncoder.decode("") === ""
  end

  @tag :pending
  test "decode single characters only" do
    assert RunLengthEncoder.decode("XYZ") === "XYZ"
  end

  @tag :pending
  test "decode string with no single characters" do
    assert RunLengthEncoder.decode("2A3B4C") == "AABBBCCCC"
  end

  @tag :pending
  test "decode single characters with repeated characters" do
    assert RunLengthEncoder.decode("12WB12W3B24WB") ===
             "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB"
  end

  @tag :pending
  test "decode multiple whitespace mixed in string" do
    assert RunLengthEncoder.decode("2 hs2q q2w2 ") === "  hsqq qww  "
  end

  @tag :pending
  test "decode lower case string" do
    assert RunLengthEncoder.decode("2a3b4c") === "aabbbcccc"
  end

  @tag :pending
  test "encode followed by decode gives original string" do
    original = "zzz ZZ  zZ"
    encoded = RunLengthEncoder.encode(original)
    assert RunLengthEncoder.decode(encoded) === original
  end
end
defmodule RunLengthEncoder do
  @doc """
  Generates a string where consecutive elements are represented as a data value and count.
  "HORSE" => "1H1O1R1S1E"
  For this example, assume all input are strings, that are all uppercase letters.
  It should also be able to reconstruct the data into its original form.
  "1H1O1R1S1E" => "HORSE"
  """
  @spec encode(String.t) :: String.t
  def encode(""), do: ""
  def encode(string) do
    String.graphemes(string)
    |> Enum.map(&({ 1, &1 }))
    |> collect
    |> Enum.map(&Tuple.to_list/1)
    |> List.flatten
    |> Enum.join
  end

  @spec collect([{Integer.t, String.t}]) :: List.t
  def collect([]), do: []
  def collect([x]), do: [x]
  def collect([{ count, letter } | tail ]) do
    {_, next_letter} = hd(tail)
    if letter == next_letter do
      collect([{ count + 1, letter } | tl(tail) ])
    else
      [{count, letter} | collect(tail)]
    end
  end

  @spec decode(String.t) :: String.t
  def decode(string) do
    Regex.scan(~r/(?<count>\d+)(?<letter>\w)/, string, capture: :all_names)
    |> Enum.map(&spread/1)
    |> Enum.join
  end

  @spec spread(List.t) :: String.t
  def spread([count, letter]) do
    count = String.to_integer(count)
    String.duplicate(letter, count)
  end
end

Community comments

Find this solution interesting? Ask the author a question to learn more.
Avatar of w1zeman1p

Not super happy with how complex this solution is, but happy that it works :)

Avatar of sahglie

Wow, I really learned a lot from your solution. I'm new to elixir so it took me a while to figure out what exactly was going on with some of the pattern matching, awesome stuff though. A few things that crossed my mind as I went through your code.

  1. String.graphemes : I had to look this up as I wasn't sure what it did. Looks like it does the same thing as String.codepoints -- which I am familiar with. I wonder if String.codepoints is more clear? Why use one over the other?

  2. Enum.map(&({ 1, &1 })) : this is some pretty funky syntax and although I love its terseness I think its a little less friendly on the eyes than:

Enum.map(fn(x) -> { 1, x } end)

In this instance, the standard anonymous function is almost as terse.

  1. How would using cond in your collect method--vs 3 separate collect signatures--change the readability of this method?

Overall great solution!

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?