🎉 Exercism Research is now launched. Help Exercism, help science and have some fun at research.exercism.io 🎉
Avatar of bathrobe

bathrobe's solution

to Parallel Letter Frequency in the Elixir Track

Published at Mar 31 2021 · 1 comment
Test suite

Count the frequency of letters in texts using parallel computation.

Parallelism is about doing things in parallel that can also be done sequentially. A common example is counting the frequency of letters. Create a function that returns the total frequency of each letter in a list of texts and that employs parallelism.

Running tests

Execute the tests with:

$ mix test

Pending tests

In the test suites, all but the first test have been skipped.

Once you get a test passing, you can unskip the next one by commenting out the relevant @tag :pending with a # symbol.

For example:

# @tag :pending
test "shouting" do
  assert Bob.hey("WATCH OUT!") == "Whoa, chill out!"

Or, you can enable all the tests by commenting out the ExUnit.configure line in the test suite.

# ExUnit.configure exclude: :pending, trace: true

If you're stuck on something, it may help to look at some of the available resources out there where answers might be found.

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.


defmodule FrequencyTest do
  use ExUnit.Case

  # Poem by Friedrich Schiller. The corresponding music is the European Anthem.
  @ode_an_die_freude """
  Freude schöner Götterfunken
  Tochter aus Elysium,
  Wir betreten feuertrunken,
  Himmlische, dein Heiligtum!
  Deine Zauber binden wieder
  Was die Mode streng geteilt;
  Alle Menschen werden Brüder,
  Wo dein sanfter Flügel weilt.

  # Dutch national anthem
  @wilhelmus """
  Wilhelmus van Nassouwe
  ben ik, van Duitsen bloed,
  den vaderland getrouwe
  blijf ik tot in den dood.
  Een Prinse van Oranje
  ben ik, vrij, onverveerd,
  den Koning van Hispanje
  heb ik altijd geëerd.

  # American national anthem
  @star_spangled_banner """
  O say can you see by the dawn's early light,
  What so proudly we hailed at the twilight's last gleaming,
  Whose broad stripes and bright stars through the perilous fight,
  O'er the ramparts we watched, were so gallantly streaming?
  And the rockets' red glare, the bombs bursting in air,
  Gave proof through the night that our flag was still there;
  O say does that star-spangled banner yet wave,
  O'er the land of the free and the home of the brave?

  # Returns the frequencies in a sorted list. This means it doesn't matter if
  # your frequency() function returns a list of pairs or a map, the
  # testing code will handle it.
  defp freq(texts, workers \\ 4) do
    Frequency.frequency(texts, workers) |> Enum.sort() |> Enum.into(%{})

  # @tag :pending
  test "no texts mean no letters" do
    assert freq([]) == %{}

  @tag :pending
  test "one letter" do
    assert freq(["a"]) == %{"a" => 1}

  @tag :pending
  test "case insensitivity" do
    assert freq(["aA"]) == %{"a" => 2}

  @tag :pending
  test "many empty texts still mean no letters" do
    assert freq(List.duplicate("  ", 10000)) == %{}

  @tag :pending
  test "many times the same text gives a predictable result" do
    assert freq(List.duplicate("abc", 1000)) == %{"a" => 1000, "b" => 1000, "c" => 1000}

  @tag :pending
  test "punctuation doesn't count" do
    assert freq([@ode_an_die_freude])[","] == nil

  @tag :pending
  test "numbers don't count" do
    assert freq(["Testing, 1, 2, 3"])["1"] == nil

  @tag :pending
  test "all three anthems, together, 1 worker" do
    freqs = freq([@ode_an_die_freude, @wilhelmus, @star_spangled_banner], 1)
    assert freqs["a"] == 49
    assert freqs["t"] == 56
    assert freqs["ü"] == 2

  @tag :pending
  test "all three anthems, together, 4 workers" do
    freqs = freq([@ode_an_die_freude, @wilhelmus, @star_spangled_banner], 4)
    assert freqs["a"] == 49
    assert freqs["t"] == 56
    assert freqs["ü"] == 2


ExUnit.configure(exclude: :pending, trace: true)
defmodule Frequency do
  @doc """
  Count letter frequency in parallel.

  Returns a map of characters to frequencies.
  @spec frequency([String.t()], pos_integer) :: map
  def frequency(texts, workers) do

# first create a stream that will process each text in parallel
stream = Task.async_stream(texts, fn txt -> 
  if txt != nil do
    # convert each formatted text to graphemes
    grp = String.graphemes(
      String.downcase(String.replace(txt, ~r/[1234567890!#$%&()*+,.:;<=>?@\^_`{|}~-]/, ""))
    # filter out empty spaces and count the frequencies 
     Enum.filter(grp, fn x -> x != " " end) |> Enum.frequencies()
  else %{} end
end, max_concurrency: workers)

# remove :ok atom from result and run all processes
total = Enum.map(stream, fn x -> elem(x,1) end)
# reduce all maps in total list by merging
List.foldl(total, %{}, fn x, a -> Map.merge(x, a, fn _k, v1, v2 ->
  v1 + v2

Community comments

Find this solution interesting? Ask the author a question to learn more.
Avatar of bathrobe
Solution Author
commented 20 days ago

Update! I started writing a short blog post about learning Elixir today, and I found Enum.frequencies staring me in the face at elixir-lang.org. I couldn't allow myself to not use it in my solution :)

Very pleased to find the List.foldl method accepted maps. Iterating through the list with a Map.merge and accompanying function shortened the length of the code considerably.

I also refactored with the |> syntax for passing parameters.

(edited 20 days ago)

bathrobe's Reflection

Whew, this took a while to figure out. Coming from a JS background I kept looking for array syntax that wasn't there. But Elixir's methods for handling lists and maps are really good.

I first figured out how to calc letter frequency in a single thread. Lost some time to UTF-8 issues in Windows, and used `to_charlist` when I should've started with graphemes. Definitely learned a lot about special characters.

The Task module was easier to understand than async/await in JS :) But that pesky `:ok` atom kept chasing me around my list traversals-- I imagine there's some far better way to handle it than `elem(x,1)` every time I want to grab my result, but it worked.

Was definitely a very pleasant feeling to see all the tests calculate fast. Working out solutions in a single-threaded way first, I saw how much slower they were before using the `async_stream`.