Avatar of mscno

mscno's solution

to Parallel Letter Frequency in the Scala Track

Published at May 10 2019 · 0 comments
Instructions
Test suite
Solution

Count the frequency of letters in texts using parallel computation.

Parallelism is about doing things in parallel that can also be done sequentially. A common example is counting the frequency of letters. Create a function that returns the total frequency of each letter in a list of texts and that employs parallelism.

Hints

According to this terminology you should write a parallel and deterministic program and (by all means!) let Scala deal with the concurrency aspect. Or else your code could quickly get messy and error-prone with all kinds of nasty concurrency bugs. In particular your program could become indeterministic which spells in practice: very (in fact, VERY) hard to debug, test and reason about.

Having said that it might be a good idea to first write a sequential solution (and use the test suite to verify it). Only then should you try to parallelize it while keeping the sequential and parallel portions of your code as separate as possible.

A first iteration could be using Scala's parallel collections. You might find that this is almost too simple (especially if you have followed our advice and already have a sequential solution).

For the second iteration we recommend you try a solution with scala.concurrent.Future. You can consult this tutorial and its sequel for some help. Make sure that you

  • have only one single blocking call to wait for the result
  • that it is at the very end of your program, and
  • that it has a timeout.

scala.concurrent.Future is used in many libraries and the doctor's advice for parallel and asynchronous programming in Scala. So it is essential for mastering the language and it should become part of your Scala armory.

The Scala exercises assume an SBT project scheme. The exercise solution source should be placed within the exercise directory/src/main/scala. The exercise unit tests can be found within the exercise directory/src/test/scala.

To run the tests simply run the command sbt test in the exercise directory.

For more detailed info about the Scala track see the help page.

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

FrequencyTest.scala

import org.scalatest.{Matchers, FunSuite}

/** @version created manually **/
class FrequencyTest extends FunSuite with Matchers {

  // Poem by Friedrich Schiller. The corresponding music is the European
  // Anthem.
  private val ode_an_die_freude = List(
    "Freude schöner Götterfunken"
    , "Tochter aus Elysium,"
    , "Wir betreten feuertrunken,"
    , "Himmlische, dein Heiligtum!"
    , "Deine Zauber binden wieder"
    , "Was die Mode streng geteilt;"
    , "Alle Menschen werden Brüder,"
    , "Wo dein sanfter Flügel weilt.")

  //Dutch national anthem
  private val wilhelmus = List(
    "Wilhelmus van Nassouwe"
    , "ben ik, van Duitsen bloed,"
    , "den vaderland getrouwe"
    , "blijf ik tot in den dood."
    , "Een Prinse van Oranje"
    , "ben ik, vrij, onverveerd,"
    , "den Koning van Hispanje"
    , "heb ik altijd geëerd.")

  //American national anthem
  private val star_spangled_banner = List(
    "O say can you see by the dawn's early light,"
    , "What so proudly we hailed at the twilight's last gleaming,"
    , "Whose broad stripes and bright stars through the perilous fight,"
    , "O'er the ramparts we watched, were so gallantly streaming?"
    , "And the rockets' red glare, the bombs bursting in air,"
    , "Gave proof through the night that our flag was still there;"
    , "O say does that star-spangled banner yet wave,"
    , "O'er the land of the free and the home of the brave?")

  test("Empty texts") {
    Frequency.frequency(1, Seq()) should be (Map())
  }

  test("Single letter") {
    pending
    Frequency.frequency(10, Seq("a")) should be (Map('a' -> 1))
  }

  test("Case insensitivity") {
    pending
    Frequency.frequency(1000, Seq("aA")) should be (Map('a' -> 2))
  }

  test("Many empty texts") {
    pending
    Frequency.frequency(1, Iterator.fill(10000)("  ").toSeq) should be (Map())
  }

  test("many times the same text gives a predictable result") {
    pending
    Frequency.frequency(1, Iterator.fill(1000)("abc").toSeq) should
      be (Map('a' -> 1000, 'b' -> 1000, 'c' -> 1000))
  }

  test("Ignore punctuation") {
    pending
    Frequency.frequency(1, ode_an_die_freude).get(',') should be (None)
  }

  test("Ignore numbers") {
    pending
    Frequency.frequency(1, Seq("Testing 1, 2, 3")).get('1') should be (None)
  }

  test("All three anthems - 1 worker") {
    pending
    val freqs = Frequency.frequency(1, ode_an_die_freude ++ wilhelmus ++ star_spangled_banner)
    freqs.get('a') should be (Some(49))
    freqs.get('t') should be (Some(56))
    freqs.get('ü') should be (Some(2))
  }

  test("All three anthems - 4 workers") {
    pending
    val freqs = Frequency.frequency(4, ode_an_die_freude ++ wilhelmus ++ star_spangled_banner)
    freqs.get('a') should be (Some(49))
    freqs.get('t') should be (Some(56))
    freqs.get('ü') should be (Some(2))
  }
}
import scala.collection.parallel.ForkJoinTaskSupport
import scala.concurrent.forkjoin.ForkJoinPool

object Frequency {

  implicit def char2string(c: Char): String = c.toString

  def frequency(numWorkers: Int, texts: Seq[String]): Map[Char, Int] = { //Map[Char, Int]
    val parTexts = texts.par
    parTexts.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(numWorkers))
    parTexts.flatMap(_.toLowerCase).filter(c => c.matches("[a-zÀ-ÿ]+")).groupBy(c => c).mapValues(_.size).seq.toMap
  }

}

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?