Avatar of lutostag

lutostag's solution

to Parallel Letter Frequency in the Rust Track

Published at Oct 05 2019 · 0 comments
Instructions
Test suite
Solution

Count the frequency of letters in texts using parallel computation.

Parallelism is about doing things in parallel that can also be done sequentially. A common example is counting the frequency of letters. Create a function that returns the total frequency of each letter in a list of texts and that employs parallelism.

Parallel Letter Frequency in Rust

Learn more about concurrency in Rust here:

Bonus

This exercise also includes a benchmark, with a sequential implementation as a baseline. You can compare your solution to the benchmark. Observe the effect different size inputs have on the performance of each. Can you surpass the benchmark using concurrent programming techniques?

As of this writing, test::Bencher is unstable and only available on nightly Rust. Run the benchmarks with Cargo:

cargo bench

If you are using rustup.rs:

rustup run nightly cargo bench

Learn more about nightly Rust:

Rust Installation

Refer to the exercism help page for Rust installation and learning resources.

Writing the Code

Execute the tests with:

$ cargo test

All but the first test have been ignored. After you get the first test to pass, open the tests source file which is located in the tests directory and remove the #[ignore] flag from the next test and get the tests to pass again. Each separate test is a function with #[test] flag above it. Continue, until you pass every test.

If you wish to run all tests without editing the tests source file, use:

$ cargo test -- --ignored

To run a specific test, for example some_test, you can use:

$ cargo test some_test

If the specific test is ignored use:

$ cargo test some_test -- --ignored

To learn more about Rust tests refer to the online test documentation

Make sure to read the Modules chapter if you haven't already, it will help you with organizing your files.

Further improvements

After you have solved the exercise, please consider using the additional utilities, described in the installation guide, to further refine your final solution.

To format your solution, inside the solution directory use

cargo fmt

To see, if your solution contains some common ineffective use cases, inside the solution directory use

cargo clippy --all-targets

Submitting the solution

Generally you should submit all files in which you implemented your solution (src/lib.rs in most cases). If you are using any external crates, please consider submitting the Cargo.toml file. This will make the review process faster and clearer.

Feedback, Issues, Pull Requests

The exercism/rust repository on GitHub is the home for all of the Rust exercises. If you have feedback about an exercise, or want to help implement new exercises, head over there and create an issue. Members of the rust track team are happy to help!

If you want to know more about Exercism, take a look at the contribution guide.

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

parallel-letter-frequency.rs

use std::collections::HashMap;

use parallel_letter_frequency as frequency;

// Poem by Friedrich Schiller. The corresponding music is the European Anthem.
const ODE_AN_DIE_FREUDE: [&str; 8] = [
    "Freude schöner Götterfunken",
    "Tochter aus Elysium,",
    "Wir betreten feuertrunken,",
    "Himmlische, dein Heiligtum!",
    "Deine Zauber binden wieder",
    "Was die Mode streng geteilt;",
    "Alle Menschen werden Brüder,",
    "Wo dein sanfter Flügel weilt.",
];

// Dutch national anthem
const WILHELMUS: [&str; 8] = [
    "Wilhelmus van Nassouwe",
    "ben ik, van Duitsen bloed,",
    "den vaderland getrouwe",
    "blijf ik tot in den dood.",
    "Een Prinse van Oranje",
    "ben ik, vrij, onverveerd,",
    "den Koning van Hispanje",
    "heb ik altijd geëerd.",
];

// American national anthem
const STAR_SPANGLED_BANNER: [&str; 8] = [
    "O say can you see by the dawn's early light,",
    "What so proudly we hailed at the twilight's last gleaming,",
    "Whose broad stripes and bright stars through the perilous fight,",
    "O'er the ramparts we watched, were so gallantly streaming?",
    "And the rockets' red glare, the bombs bursting in air,",
    "Gave proof through the night that our flag was still there;",
    "O say does that star-spangled banner yet wave,",
    "O'er the land of the free and the home of the brave?",
];

#[test]
fn test_no_texts() {
    assert_eq!(frequency::frequency(&[], 4), HashMap::new());
}

#[test]
#[ignore]
fn test_one_letter() {
    let mut hm = HashMap::new();
    hm.insert('a', 1);
    assert_eq!(frequency::frequency(&["a"], 4), hm);
}

#[test]
#[ignore]
fn test_case_insensitivity() {
    let mut hm = HashMap::new();
    hm.insert('a', 2);
    assert_eq!(frequency::frequency(&["aA"], 4), hm);
}

#[test]
#[ignore]
fn test_many_empty_lines() {
    let mut v = Vec::with_capacity(1000);
    for _ in 0..1000 {
        v.push("");
    }
    assert_eq!(frequency::frequency(&v[..], 4), HashMap::new());
}

#[test]
#[ignore]
fn test_many_times_same_text() {
    let mut v = Vec::with_capacity(1000);
    for _ in 0..1000 {
        v.push("abc");
    }
    let mut hm = HashMap::new();
    hm.insert('a', 1000);
    hm.insert('b', 1000);
    hm.insert('c', 1000);
    assert_eq!(frequency::frequency(&v[..], 4), hm);
}

#[test]
#[ignore]
fn test_punctuation_doesnt_count() {
    assert!(!frequency::frequency(&WILHELMUS, 4).contains_key(&','));
}

#[test]
#[ignore]
fn test_numbers_dont_count() {
    assert!(!frequency::frequency(&["Testing, 1, 2, 3"], 4).contains_key(&'1'));
}

#[test]
#[ignore]
fn test_all_three_anthems_1_worker() {
    let mut v = Vec::new();
    for anthem in [ODE_AN_DIE_FREUDE, WILHELMUS, STAR_SPANGLED_BANNER].iter() {
        for line in anthem.iter() {
            v.push(*line);
        }
    }
    let freqs = frequency::frequency(&v[..], 1);
    assert_eq!(freqs.get(&'a'), Some(&49));
    assert_eq!(freqs.get(&'t'), Some(&56));
    assert_eq!(freqs.get(&'ü'), Some(&2));
}

#[test]
#[ignore]
fn test_all_three_anthems_3_workers() {
    let mut v = Vec::new();
    for anthem in [ODE_AN_DIE_FREUDE, WILHELMUS, STAR_SPANGLED_BANNER].iter() {
        for line in anthem.iter() {
            v.push(*line);
        }
    }
    let freqs = frequency::frequency(&v[..], 3);
    assert_eq!(freqs.get(&'a'), Some(&49));
    assert_eq!(freqs.get(&'t'), Some(&56));
    assert_eq!(freqs.get(&'ü'), Some(&2));
}

src/lib.rs

use crossbeam_utils::thread;
use std::collections::HashMap;
use std::sync::{Arc, Mutex};

pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
    if input.is_empty() {
        return HashMap::new();
    }
    if worker_count == 0 {
        panic!("workers must be > 0");
    }

    let mut chunk_size = input.len() / worker_count;
    if input.len() % worker_count != 0 {
        chunk_size += 1;
    }

    let total_count = Arc::new(Mutex::new(HashMap::new()));
    thread::scope(|s| {
        let mut pool = Vec::with_capacity(worker_count);

        for chunk in input.chunks(chunk_size) {
            let total_count = Arc::clone(&total_count);
            pool.push(s.spawn(move |_| {
                let mut thread_count = HashMap::new();
                for string in chunk {
                    for c in string.to_lowercase().chars().filter(|c| c.is_alphabetic()) {
                        *thread_count.entry(c).or_insert(0) += 1;
                    }
                }
                let mut total_count = total_count.lock().unwrap();

                for (letter, count) in thread_count {
                    *total_count.entry(letter).or_insert(0) += count;
                }
            }));
        }
        for thread in pool {
            thread.join().unwrap();
        }
    })
    .unwrap();
    Arc::try_unwrap(total_count).unwrap().into_inner().unwrap()
}

Cargo.toml

[package]
edition = "2018"
name = "parallel-letter-frequency"
version = "0.0.0"

[dependencies]
crossbeam-utils = "0.6"

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?