Avatar of benkenawell

benkenawell's solution

to Hamming in the Rust Track

Published at Oct 16 2019 · 0 comments
Instructions
Test suite
Solution

Calculate the Hamming Distance between two DNA strands.

Your body is made up of cells that contain DNA. Those cells regularly wear out and need replacing, which they achieve by dividing into daughter cells. In fact, the average human body experiences about 10 quadrillion cell divisions in a lifetime!

When cells divide, their DNA replicates too. Sometimes during this process mistakes happen and single pieces of DNA get encoded with the incorrect information. If we compare two strands of DNA and count the differences between them we can see how many mistakes occurred. This is known as the "Hamming Distance".

We read DNA using the letters C,A,G and T. Two strands might look like this:

GAGCCTACTAACGGGAT
CATCGTAATGACGGCCT
^ ^ ^  ^ ^    ^^

They have 7 differences, and therefore the Hamming Distance is 7.

The Hamming Distance is useful for lots of things in science, not just biology, so it's a nice phrase to be familiar with :)

Implementation notes

The Hamming distance is only defined for sequences of equal length, so an attempt to calculate it between sequences of different lengths should not work. The general handling of this situation (e.g., raising an exception vs returning a special value) may differ between languages.

Rust Installation

Refer to the exercism help page for Rust installation and learning resources.

Writing the Code

Execute the tests with:

$ cargo test

All but the first test have been ignored. After you get the first test to pass, open the tests source file which is located in the tests directory and remove the #[ignore] flag from the next test and get the tests to pass again. Each separate test is a function with #[test] flag above it. Continue, until you pass every test.

If you wish to run all ignored tests without editing the tests source file, use:

$ cargo test -- --ignored

To run a specific test, for example some_test, you can use:

$ cargo test some_test

If the specific test is ignored use:

$ cargo test some_test -- --ignored

To learn more about Rust tests refer to the online test documentation

Make sure to read the Modules chapter if you haven't already, it will help you with organizing your files.

Further improvements

After you have solved the exercise, please consider using the additional utilities, described in the installation guide, to further refine your final solution.

To format your solution, inside the solution directory use

cargo fmt

To see, if your solution contains some common ineffective use cases, inside the solution directory use

cargo clippy --all-targets

Submitting the solution

Generally you should submit all files in which you implemented your solution (src/lib.rs in most cases). If you are using any external crates, please consider submitting the Cargo.toml file. This will make the review process faster and clearer.

Feedback, Issues, Pull Requests

The exercism/rust repository on GitHub is the home for all of the Rust exercises. If you have feedback about an exercise, or want to help implement new exercises, head over there and create an issue. Members of the rust track team are happy to help!

If you want to know more about Exercism, take a look at the contribution guide.

Source

The Calculating Point Mutations problem at Rosalind http://rosalind.info/problems/hamm/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

hamming.rs

use hamming;

fn process_distance_case(strand_pair: [&str; 2], expected_distance: Option<usize>) {
    assert_eq!(
        hamming::hamming_distance(strand_pair[0], strand_pair[1]),
        expected_distance
    );
}

#[test]
fn test_empty_strands() {
    process_distance_case(["", ""], Some(0));
}

#[test]
#[ignore]
/// disallow first strand longer
fn test_disallow_first_strand_longer() {
    process_distance_case(["AATG", "AAA"], None);
}

#[test]
#[ignore]
/// disallow second strand longer
fn test_disallow_second_strand_longer() {
    process_distance_case(["ATA", "AGTG"], None);
}

#[test]
#[ignore]
fn test_first_string_is_longer() {
    process_distance_case(["AAA", "AA"], None);
}

#[test]
#[ignore]
fn test_second_string_is_longer() {
    process_distance_case(["A", "AA"], None);
}

#[test]
#[ignore]
/// single letter identical strands
fn test_single_letter_identical_strands() {
    process_distance_case(["A", "A"], Some(0));
}

#[test]
#[ignore]
/// small distance
fn test_single_letter_different_strands() {
    process_distance_case(["G", "T"], Some(1));
}

#[test]
#[ignore]
/// long identical strands
fn test_long_identical_strands() {
    process_distance_case(["GGACTGAAATCTG", "GGACTGAAATCTG"], Some(0));
}

#[test]
#[ignore]
fn test_no_difference_between_identical_strands() {
    process_distance_case(["GGACTGA", "GGACTGA"], Some(0));
}

#[test]
#[ignore]
fn test_complete_hamming_distance_in_small_strand() {
    process_distance_case(["ACT", "GGA"], Some(3));
}

#[test]
#[ignore]
fn test_small_hamming_distance_in_the_middle_somewhere() {
    process_distance_case(["GGACG", "GGTCG"], Some(1));
}

#[test]
#[ignore]
fn test_larger_distance() {
    process_distance_case(["ACCAGGG", "ACTATGG"], Some(2));
}

#[test]
#[ignore]
/// large distance in off-by-one strand
fn test_long_different_strands() {
    process_distance_case(["GGACGGATTCTG", "AGGACGGATTCT"], Some(9));
}
/// Return the Hamming distance between the strings,
/// or None if the lengths are mismatched.
pub fn hamming_distance(s1: &str, s2: &str) -> Option<usize> {
    if s1.len() != s2.len() { return None; }
    // from here, we assume the strings are equal length
    
    let iter = s1.chars().zip(s2.chars());
    let dist = iter.fold(0, |acc, (x, y)| if x != y { acc + 1 } else { acc });
    Some(dist)
}

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?