🎉 Exercism Research is now launched. Help Exercism, help science and have some fun at research.exercism.io 🎉
Avatar of remcopeereboom

remcopeereboom's solution

to Hamming in the Ruby Track

Published at Jul 13 2018 · 15 comments
Instructions
Test suite
Solution

Note:

This solution was written on an old version of Exercism. The tests below might not correspond to the solution code, and the exercise may have changed since this code was written.

Calculate the Hamming difference between two DNA strands.

A mutation is simply a mistake that occurs during the creation or copying of a nucleic acid, in particular DNA. Because nucleic acids are vital to cellular functions, mutations tend to cause a ripple effect throughout the cell. Although mutations are technically mistakes, a very rare mutation may equip the cell with a beneficial attribute. In fact, the macro effects of evolution are attributable by the accumulated result of beneficial microscopic mutations over many generations.

The simplest and most common type of nucleic acid mutation is a point mutation, which replaces one base with another at a single nucleotide.

By counting the number of differences between two homologous DNA strands taken from different genomes with a common ancestor, we get a measure of the minimum number of point mutations that could have occurred on the evolutionary path between the two strands.

This is called the 'Hamming distance'.

It is found by comparing two DNA strands and counting how many of the nucleotides are different from their equivalent in the other string.

GAGCCTACTAACGGGAT
CATCGTAATGACGGCCT
^ ^ ^  ^ ^    ^^

The Hamming distance between these two DNA strands is 7.

Implementation notes

The Hamming distance is only defined for sequences of equal length. This means that based on the definition, each language could deal with getting sequences of equal length differently.


For installation and learning resources, refer to the exercism help page.

For running the tests provided, you will need the Minitest gem. Open a terminal window and run the following command to install minitest:

gem install minitest

If you would like color output, you can require 'minitest/pride' in the test file, or note the alternative instruction, below, for running the test file.

Run the tests from the exercise directory using the following command:

ruby hamming_test.rb

To include color from the command line:

ruby -r minitest/pride hamming_test.rb

Source

The Calculating Point Mutations problem at Rosalind http://rosalind.info/problems/hamm/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

hamming_test.rb

require 'minitest/autorun'
require_relative 'hamming'

# Common test data version: 2.0.1 f79dfd7
class HammingTest < Minitest::Test
  def test_empty_strands
    # skip
    assert_equal 0, Hamming.compute('', '')
  end

  def test_identical_strands
    skip
    assert_equal 0, Hamming.compute('A', 'A')
  end

  def test_long_identical_strands
    skip
    assert_equal 0, Hamming.compute('GGACTGA', 'GGACTGA')
  end

  def test_complete_distance_in_single_nucleotide_strands
    skip
    assert_equal 1, Hamming.compute('A', 'G')
  end

  def test_complete_distance_in_small_strands
    skip
    assert_equal 2, Hamming.compute('AG', 'CT')
  end

  def test_small_distance_in_small_strands
    skip
    assert_equal 1, Hamming.compute('AT', 'CT')
  end

  def test_small_distance
    skip
    assert_equal 1, Hamming.compute('GGACG', 'GGTCG')
  end

  def test_small_distance_in_long_strands
    skip
    assert_equal 2, Hamming.compute('ACCAGGG', 'ACTATGG')
  end

  def test_non_unique_character_in_first_strand
    skip
    assert_equal 1, Hamming.compute('AAG', 'AAA')
  end

  def test_non_unique_character_in_second_strand
    skip
    assert_equal 1, Hamming.compute('AAA', 'AAG')
  end

  def test_same_nucleotides_in_different_positions
    skip
    assert_equal 2, Hamming.compute('TAG', 'GAT')
  end

  def test_large_distance
    skip
    assert_equal 4, Hamming.compute('GATACA', 'GCATAA')
  end

  def test_large_distance_in_off_by_one_strand
    skip
    assert_equal 9, Hamming.compute('GGACGGATTCTG', 'AGGACGGATTCT')
  end

  def test_disallow_first_strand_longer
    skip
    assert_raises(ArgumentError) { Hamming.compute('AATG', 'AAA') }
  end

  def test_disallow_second_strand_longer
    skip
    assert_raises(ArgumentError) { Hamming.compute('ATA', 'AGTG') }
  end

  # Problems in exercism evolve over time, as we find better ways to ask
  # questions.
  # The version number refers to the version of the problem you solved,
  # not your solution.
  #
  # Define a constant named VERSION inside of the top level BookKeeping
  # module, which may be placed near the end of your file.
  #
  # In your file, it will look like this:
  #
  # module BookKeeping
  #   VERSION = 1 # Where the version number matches the one in the test.
  # end
  #
  # If you are curious, read more about constants on RubyDoc:
  # http://ruby-doc.org/docs/ruby-doc-bundle/UsersGuide/rg/constants.html

  def test_bookkeeping
    skip
    assert_equal 3, BookKeeping::VERSION
  end
end
module Hamming
  def self.compute(strand_1, strand_2)
    0.upto(strand_1.length).count { |i| strand_1[i] != strand_2[i] }
  end
end

Community comments

Find this solution interesting? Ask the author a question to learn more.
Avatar of kytrinyx

... and I see now that this is where you went before I even joined the party :)

I like this version a lot. It seems like the simplest to read (short, succinct) without sacrificing the performance characteristics of not having an additional array.

Avatar of remcopeereboom

Repost from version 3...

@kytrinyx Thanks for the suggestion (And all of exercism!). I had already tried to extract the common behaviour into its own method and found it was basically just #zip but for Strings. This did lead me to solution nr. 5 which is symmetric and does feel cleaner to me.

The question I have now is, does ruby's zip always create an array of pairs and then pass along the pairs to the block or does it skip array creation if it knows it has a block?

Avatar of deleted-github-user-6701528
deleted-github-user-6701528
commented over 5 years ago

Good question. It looks like it always creates pair arrays. [1, 2, 3, 4, 5].zip([5, 4, 3, 2, 1]) { |x| puts x.class } Array Array ...

Avatar of remcopeereboom

@sleeper-public True, but that is the pair you want from zip. It is O(1), because it always just a pair. My concern is more if it first creates the entire array of pairs and then iterates over that array instead of iterating simultaneously over the source arrays. I fear it might do this: [[1, 5], [2, 4], [3, 3], [4, 2], [5, 1]].each. That would be easier to implement but has space complexity O(n).

Avatar of deleted-github-user-6701528
deleted-github-user-6701528
commented over 5 years ago

Ahh, my mistake. Since it returns nil with a block I would think it generates them as it goes - at least I would hope so.

Avatar of wbajzek

Whoa, that's really cool. I'd have a hard time convincing myself to do it the same way, but it is more readable than I expected from such concise code.

Does it pass all the tests?

Avatar of remcopeereboom

Does it pass all the tests?

It should. At least it did at the time, but they've since changed the tests. I guess I need a version number constant these days.

Avatar of wbajzek

Ah, ok. It doesn't pass the tests, but it's probably because it was from an older version. I'm new here and didn't think of that.

Avatar of danijuyusuf

I'm new here, too. Seeing this very short code thrills me. I think it's only two lines short of passing the new tests.

Avatar of ests

I love it! But what about ArgumentError, hmm?

Avatar of kytrinyx

@ests This passed the tests as they were written at the time. We evolve the test suites as we think of interesting edge cases or good improvements.

Avatar of ests

@kytrinyx got it, thank you :)

Avatar of visokoo

In your solution, your upto.count method actually returns one more than whatever string is passed. So your last iteration is comparing nil to nil. If you subtract 1 from length, you'll also get the right answer with one less iteration.

Avatar of calebkeene

IMO it'd be nicer to read if you used a do block and named the index variable accordingly rather than using the classic i (even though it's obvious, it still reads better). Also the latest version requires that raise - not sure if you're sufficiently motivated to add it

Avatar of remcopeereboom

@calebkeene Thanks for the feedback. I prefer curly braces for one liners like these. I also like keeping things as succinct as possible. I use i for that reason and also because it works consistently in nested loops (you can use j, and k) for those. Mostly a matter of taste.

The new versions require an exception, but the one I coded to does not. I don't like that the tests force exceptions everywhere. Sometimes you want to keep the exceptions limited to the program boundaries. Having exceptions everywhere can document preconditions, but it can also make very simple code a lot more complicated than it needs to be. There can also be a lot of overhead when you end up checking the same conditions over and over. I would have liked the tests to create a little more awareness in the trade-offs in these decisions.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?