🎉 Exercism Research is now launched. Help Exercism, help science and have some fun at research.exercism.io 🎉
Avatar of billwanjohi

billwanjohi's solution

to Parallel Letter Frequency in the Python Track

Published at Jul 13 2018 · 1 comment
Instructions
Test suite
Solution

Note:

This solution was written on an old version of Exercism. The tests below might not correspond to the solution code, and the exercise may have changed since this code was written.

Count the frequency of letters in texts using parallel computation.

Parallelism is about doing things in parallel that can also be done sequentially. A common example is counting the frequency of letters. Create a function that returns the total frequency of each letter in a list of texts and that employs parallelism.

Exception messages

Sometimes it is necessary to raise an exception. When you do this, you should include a meaningful error message to indicate what the source of the error is. This makes your code more readable and helps significantly with debugging. Not every exercise will require you to raise an exception, but for those that do, the tests will only pass if you include a message.

To raise a message with an exception, just write it as an argument to the exception type. For example, instead of raise Exception, you should write:

raise Exception("Meaningful message indicating the source of the error")

Running the tests

To run the tests, run the appropriate command below (why they are different):

  • Python 2.7: py.test parallel_letter_frequency_test.py
  • Python 3.4+: pytest parallel_letter_frequency_test.py

Alternatively, you can tell Python to run the pytest module (allowing the same command to be used regardless of Python version): python -m pytest parallel_letter_frequency_test.py

Common pytest options

  • -v : enable verbose output
  • -x : stop running tests on first failure
  • --ff : run failures from previous test before running other test cases

For other options, see python -m pytest -h

Submitting Exercises

Note that, when trying to submit an exercise, make sure the solution is in the $EXERCISM_WORKSPACE/python/parallel-letter-frequency directory.

You can find your Exercism workspace by running exercism debug and looking for the line that starts with Workspace.

For more detailed information about running tests, code style and linting, please see the help page.

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

parallel_letter_frequency_test.py

# -*- coding: utf-8 -*-
from collections import Counter
import unittest

from parallel_letter_frequency import calculate


class ParallelLetterFrequencyTest(unittest.TestCase):
    def test_one_letter(self):
        actual = calculate(['a'])
        expected = {'a': 1}
        self.assertDictEqual(actual, expected)

    def test_case_insensitivity(self):
        actual = calculate(['aA'])
        expected = {'a': 2}
        self.assertDictEqual(actual, expected)

    def test_numbers(self):
        actual = calculate(['012', '345', '6789'])
        expected = {}
        self.assertDictEqual(actual, expected)

    def test_punctuations(self):
        actual = calculate(['[]\;,', './{}|', ':"<>?'])
        expected = {}
        self.assertDictEqual(actual, expected)

    def test_whitespaces(self):
        actual = calculate(['  ', '\t ', '\n\n'])
        expected = {}
        self.assertDictEqual(actual, expected)

    def test_repeated_string_with_known_frequencies(self):
        letter_frequency = 3
        text_input = 'abc\n' * letter_frequency
        actual = calculate(text_input.split('\n'))
        expected = {'a': letter_frequency, 'b': letter_frequency,
                    'c': letter_frequency}
        self.assertDictEqual(actual, expected)

    def test_multiline_text(self):
        text_input = "3 Quotes from Excerism Homepage:\n" + \
                     "\tOne moment you feel like you're\n" + \
                     "getting it. The next moment you're\n" + \
                     "stuck.\n" + \
                     "\tYou know what it’s like to be fluent.\n" + \
                     "Suddenly you’re feeling incompetent\n" + \
                     "and clumsy.\n" + \
                     "\tHaphazard, convoluted code is\n" + \
                     "infuriating, not to mention costly. That\n" + \
                     "slapdash explosion of complexity is an\n" + \
                     "expensive yak shave waiting to\n" + \
                     "happen."
        actual = calculate(text_input.split('\n'))
        expected = Counter([x for x in text_input.lower() if x.isalpha()])
        self.assertDictEqual(actual, expected)


if __name__ == '__main__':
    unittest.main()
import collections
import multiprocessing
import queue
import re
import time


def calc_one_string(q, s):
    counts = collections.defaultdict(int)
    for char in re.sub('[^a-z]', '', s.lower()):
        counts[char] += 1
    for k, v in counts.items():
        q.put((k, v))


def calculate(text_input):
    counts = collections.defaultdict(int)
    q = multiprocessing.Queue()
    for s in text_input:
        p = multiprocessing.Process(target=calc_one_string, args=(q, s))
        p.start()
    iterations = 0
    while p.is_alive() or not q.empty():
        try:
            k, v = q.get_nowait()
            counts[k] += v
        except queue.Empty:
            # without this, q.empty() sometimes turns True erroneously
            time.sleep(0.01)
            pass
    return dict(counts)

Community comments

Find this solution interesting? Ask the author a question to learn more.
Avatar of billwanjohi

I figured the purest approach to this would utilize separate process, not just threads, so the multiprocessing library, and minimal shared state / locking, so putting values on a queue rather than updating.

I didn't reduce the queued letter counts across multiple threads, but I do begin to loop through them before all processes have exited, so it to is to some degree "parallel."

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?