Count the frequency of letters in texts using parallel computation.
Parallelism is about doing things in parallel that can also be done sequentially. A common example is counting the frequency of letters. Create a function that returns the total frequency of each letter in a list of texts and that employs parallelism.
Execute the tests with:
$ mix test
In the test suites, all but the first test have been skipped.
Once you get a test passing, you can unskip the next one by
commenting out the relevant @tag :pending
with a #
symbol.
For example:
# @tag :pending
test "shouting" do
assert Bob.hey("WATCH OUT!") == "Whoa, chill out!"
end
Or, you can enable all the tests by commenting out the
ExUnit.configure
line in the test suite.
# ExUnit.configure exclude: :pending, trace: true
If you're stuck on something, it may help to look at some of the available resources out there where answers might be found.
It's possible to submit an incomplete solution so you can see how others have completed the exercise.
defmodule FrequencyTest do
use ExUnit.Case
# Poem by Friedrich Schiller. The corresponding music is the European Anthem.
@ode_an_die_freude """
Freude schöner Götterfunken
Tochter aus Elysium,
Wir betreten feuertrunken,
Himmlische, dein Heiligtum!
Deine Zauber binden wieder
Was die Mode streng geteilt;
Alle Menschen werden Brüder,
Wo dein sanfter Flügel weilt.
"""
# Dutch national anthem
@wilhelmus """
Wilhelmus van Nassouwe
ben ik, van Duitsen bloed,
den vaderland getrouwe
blijf ik tot in den dood.
Een Prinse van Oranje
ben ik, vrij, onverveerd,
den Koning van Hispanje
heb ik altijd geëerd.
"""
# American national anthem
@star_spangled_banner """
O say can you see by the dawn's early light,
What so proudly we hailed at the twilight's last gleaming,
Whose broad stripes and bright stars through the perilous fight,
O'er the ramparts we watched, were so gallantly streaming?
And the rockets' red glare, the bombs bursting in air,
Gave proof through the night that our flag was still there;
O say does that star-spangled banner yet wave,
O'er the land of the free and the home of the brave?
"""
# Returns the frequencies in a sorted list. This means it doesn't matter if
# your frequency() function returns a list of pairs or a map, the
# testing code will handle it.
defp freq(texts, workers \\ 4) do
Frequency.frequency(texts, workers) |> Enum.sort() |> Enum.into(%{})
end
# @tag :pending
test "no texts mean no letters" do
assert freq([]) == %{}
end
@tag :pending
test "one letter" do
assert freq(["a"]) == %{"a" => 1}
end
@tag :pending
test "case insensitivity" do
assert freq(["aA"]) == %{"a" => 2}
end
@tag :pending
test "many empty texts still mean no letters" do
assert freq(List.duplicate(" ", 10000)) == %{}
end
@tag :pending
test "many times the same text gives a predictable result" do
assert freq(List.duplicate("abc", 1000)) == %{"a" => 1000, "b" => 1000, "c" => 1000}
end
@tag :pending
test "punctuation doesn't count" do
assert freq([@ode_an_die_freude])[","] == nil
end
@tag :pending
test "numbers don't count" do
assert freq(["Testing, 1, 2, 3"])["1"] == nil
end
@tag :pending
test "all three anthems, together, 1 worker" do
freqs = freq([@ode_an_die_freude, @wilhelmus, @star_spangled_banner], 1)
assert freqs["a"] == 49
assert freqs["t"] == 56
assert freqs["ü"] == 2
end
@tag :pending
test "all three anthems, together, 4 workers" do
freqs = freq([@ode_an_die_freude, @wilhelmus, @star_spangled_banner], 4)
assert freqs["a"] == 49
assert freqs["t"] == 56
assert freqs["ü"] == 2
end
end
ExUnit.start()
ExUnit.configure(exclude: :pending, trace: true)
defmodule Frequency do
@doc """
Count letter frequency in parallel.
Returns a map of characters to frequencies.
The number of worker processes to use can be set with 'workers'.
"""
@spec frequency([String.t()], pos_integer) :: map
def frequency(texts, workers) do
# To maximize the workers, I will be sending them equal amount of work
# Combine all the text together, to determine how it will be split
all_letters = texts |> Enum.reduce("", fn val, acc -> acc <> val end)
total_length = String.length(all_letters)
# Get the minimum number of letters for each worker
equal_share_per_worker = div(total_length, workers)
# In the case where the amount of work is not perfectly divisible amongst the workers
# Get the left over work
remaining_share = rem(total_length, workers)
# Here we build up a work distribution array, making use of the min and leftover work
# e.g 5 letters to be distributed over 4 workers
# min = 1, leftover = 1
# work_distribution = [2, 1, 1, 1]
# leftover was added to the first worker
# using math to model to above
work_distribution = Enum.map(1..workers, fn worker ->
if worker <= remaining_share do
equal_share_per_worker + 1
else
equal_share_per_worker
end
end)
# Given the work distribution, we create the workers
# and assign them the appropriate number of letters to work on
{tasks, _} = Enum.map_reduce(work_distribution, 0, fn work_size, start_index ->
text_portion = String.slice(all_letters, start_index, work_size)
{ Task.async(fn -> count_letters(text_portion) end), start_index + work_size - 1 }
end)
# await the result for all the tasks
results = Enum.map(tasks, &Task.await/1)
# merge their individual counts, to compute global count
Enum.reduce(results, %{}, &merge_map_sum(&1, &2))
end
# Make lowercase, extract each letter, count and generate map
def count_letters(text) do
letters = text |> String.downcase |> String.graphemes
Enum.reduce(
letters,
%{},
fn letter, freqMap ->
Map.put(freqMap, letter, (freqMap[letter] || 0) + 1)
end
)
end
# Combine map by summing, also remove letters that are not allowed by test
def merge_map_sum(map1, map2) do
Map.merge(map1, map2, fn _k, v1, v2 -> v1 + v2 end)
|> remove_letters
end
def remove_letters(map) do
# Using ascii values will probably be better for this
to_remove = [" ", ",", "1", "2", "3", "4", "5", "6", "7", "8", "9", "0" ]
Enum.reduce(to_remove, map, fn val, acc ->
Map.delete(acc, val)
end)
end
end
This took so long, learnt the most from it. Will even learn more by looking at the solutions of others.
Level up your programming skills with 3,450 exercises across 52 languages, and insightful discussion with our volunteer team of welcoming mentors. Exercism is 100% free forever.
Sign up Learn More
Community comments