Avatar of davearonson

davearonson's solution

to Parallel Letter Frequency in the Elixir Track

Published at Jul 13 2018 · 1 comment
Instructions
Test suite
Solution

Note:

This solution was written on an old version of Exercism. The tests below might not correspond to the solution code, and the exercise may have changed since this code was written.

Count the frequency of letters in texts using parallel computation.

Parallelism is about doing things in parallel that can also be done sequentially. A common example is counting the frequency of letters. Create a function that returns the total frequency of each letter in a list of texts and that employs parallelism.

Running tests

Execute the tests with:

$ elixir parallel_letter_frequency_test.exs

Pending tests

In the test suites, all but the first test have been skipped.

Once you get a test passing, you can unskip the next one by commenting out the relevant @tag :pending with a # symbol.

For example:

# @tag :pending
test "shouting" do
  assert Bob.hey("WATCH OUT!") == "Whoa, chill out!"
end

Or, you can enable all the tests by commenting out the ExUnit.configure line in the test suite.

# ExUnit.configure exclude: :pending, trace: true

For more detailed information about the Elixir track, please see the help page.

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

parallel_letter_frequency_test.exs

if !System.get_env("EXERCISM_TEST_EXAMPLES") do
  Code.load_file("frequency.exs", __DIR__)
end

ExUnit.start()
ExUnit.configure(exclude: :pending, trace: true)

# Your code should contain a frequency(texts, workers) function which accepts a
# list of texts and the number of workers to use in parallel.

defmodule FrequencyTest do
  use ExUnit.Case

  # Poem by Friedrich Schiller. The corresponding music is the European Anthem.
  @ode_an_die_freude """
  Freude schöner Götterfunken
  Tochter aus Elysium,
  Wir betreten feuertrunken,
  Himmlische, dein Heiligtum!
  Deine Zauber binden wieder
  Was die Mode streng geteilt;
  Alle Menschen werden Brüder,
  Wo dein sanfter Flügel weilt.
  """

  # Dutch national anthem
  @wilhelmus """
  Wilhelmus van Nassouwe
  ben ik, van Duitsen bloed,
  den vaderland getrouwe
  blijf ik tot in den dood.
  Een Prinse van Oranje
  ben ik, vrij, onverveerd,
  den Koning van Hispanje
  heb ik altijd geëerd.
  """

  # American national anthem
  @star_spangled_banner """
  O say can you see by the dawn's early light,
  What so proudly we hailed at the twilight's last gleaming,
  Whose broad stripes and bright stars through the perilous fight,
  O'er the ramparts we watched, were so gallantly streaming?
  And the rockets' red glare, the bombs bursting in air,
  Gave proof through the night that our flag was still there;
  O say does that star-spangled banner yet wave,
  O'er the land of the free and the home of the brave?
  """

  # Returns the frequencies in a sorted list. This means it doesn't matter if
  # your frequency() function returns a list of pairs or a map, the
  # testing code will handle it.
  defp freq(texts, workers \\ 4) do
    Frequency.frequency(texts, workers) |> Enum.sort() |> Enum.into(%{})
  end

  # @tag :pending
  test "no texts mean no letters" do
    assert freq([]) == %{}
  end

  @tag :pending
  test "one letter" do
    assert freq(["a"]) == %{"a" => 1}
  end

  @tag :pending
  test "case insensitivity" do
    assert freq(["aA"]) == %{"a" => 2}
  end

  @tag :pending
  test "many empty texts still mean no letters" do
    assert freq(List.duplicate("  ", 10000)) == %{}
  end

  @tag :pending
  test "many times the same text gives a predictable result" do
    assert freq(List.duplicate("abc", 1000)) == %{"a" => 1000, "b" => 1000, "c" => 1000}
  end

  @tag :pending
  test "punctuation doesn't count" do
    assert freq([@ode_an_die_freude])[","] == nil
  end

  @tag :pending
  test "numbers don't count" do
    assert freq(["Testing, 1, 2, 3"])["1"] == nil
  end

  @tag :pending
  test "all three anthems, together, 1 worker" do
    freqs = freq([@ode_an_die_freude, @wilhelmus, @star_spangled_banner], 1)
    assert freqs["a"] == 49
    assert freqs["t"] == 56
    assert freqs["ü"] == 2
  end

  @tag :pending
  test "all three anthems, together, 4 workers" do
    freqs = freq([@ode_an_die_freude, @wilhelmus, @star_spangled_banner], 4)
    assert freqs["a"] == 49
    assert freqs["t"] == 56
    assert freqs["ü"] == 2
  end
end
defmodule Frequency do
  @doc """
  Count letter frequency in parallel.

  Returns a map of characters to frequencies.

  The number of worker processes to use can be set with 'workers'.
  """
  @spec frequency([String.t], pos_integer) :: map
  def frequency(texts, workers) do
    do_frequency(texts, workers)
  end

  defp do_frequency(texts, workers) do
    texts
    |> Enum.flat_map(&(String.split(&1, "\n")))
    |> split_among(workers, 0, {})
    |> spawn_workers(self, [])
    |> Enum.map(&get_results/1)
    |> Enum.reduce(%{}, &combine_results/2)
  end

  defp split_among([], _, _, acc), do: Tuple.to_list(acc)

  defp split_among([hd|tl], workers, cur, acc) when tuple_size(acc) < workers do
    split_among(tl, workers, rem(cur + 1, workers), Tuple.append(acc, [hd]))
  end

  defp split_among([hd|tl], workers, cur, acc) do
    split_among(tl,
                workers,
                rem(cur + 1, workers),
                put_elem(acc, cur, [hd | elem(acc, cur)]))
  end

  defp spawn_workers([], _, acc), do: acc
  defp spawn_workers([lines|more], caller, acc) do
    spawn_workers(more, caller, [spawn(fn -> analyze(lines, caller) end) | acc])
  end

  defp analyze(lines, caller) do
    send(caller,
         lines
         |> Enum.join
         |> String.downcase
         |> get_letters
         |> Enum.reduce(%{}, &count_char/2))
  end

  defp get_letters(text) do
    Regex.scan(~r/\pL/u, text) |> List.flatten
  end

  defp count_char(char, acc) do
    Map.put(acc, char, Map.get(acc, char, 0) + 1)
  end

  defp get_results(_) do
    receive do
      whatever -> whatever
    end
  end

  defp combine_results(elt, acc) do
    Map.merge(elt, acc, fn _k, v1, v2 -> v1 + v2 end)
  end

end

Community comments

Find this solution interesting? Ask the author a question to learn more.
Avatar of davearonson

Still not too happy with the performance -- 0.03 sec on my main box, while all previous Elixir exercises have been 0.00. Tried a bit of homegrown profiling, which indicated that analyze was the worst, at often over 10 ms. Might try some more tweaks, maybe some "real" profiling, but I have bigger fish to fry right now....

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?