🎉 Exercism Research is now launched. Help Exercism, help science and have some fun at research.exercism.io 🎉
Avatar of remcopeereboom

remcopeereboom's solution

to Nucleotide Count in the Ruby Track

Published at Jul 13 2018 · 3 comments
Instructions
Test suite
Solution

Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.

The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.

Here is an analogy:

  • twigs are to birds nests as
  • nucleotides are to DNA as
  • legos are to lego houses as
  • words are to sentences as...

For installation and learning resources, refer to the exercism help page.

For running the tests provided, you will need the Minitest gem. Open a terminal window and run the following command to install minitest:

gem install minitest

If you would like color output, you can require 'minitest/pride' in the test file, or note the alternative instruction, below, for running the test file.

Run the tests from the exercise directory using the following command:

ruby nucleotide_count_test.rb

To include color from the command line:

ruby -r minitest/pride nucleotide_count_test.rb

Source

The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

nucleotide_count_test.rb

require 'minitest/autorun'
require_relative 'nucleotide_count'

class NucleotideTest < Minitest::Test
  def test_empty_dna_strand_has_no_adenosine
    assert_equal 0, Nucleotide.from_dna('').count('A')
  end

  def test_repetitive_cytidine_gets_counted
    skip
    assert_equal 5, Nucleotide.from_dna('CCCCC').count('C')
  end

  def test_counts_only_thymidine
    skip
    assert_equal 1, Nucleotide.from_dna('GGGGGTAACCCGG').count('T')
  end

  def test_counts_a_nucleotide_only_once
    skip
    dna = Nucleotide.from_dna('CGATTGGG')
    dna.count('T')
    dna.count('T')
    assert_equal 2, dna.count('T')
  end

  def test_empty_dna_strand_has_no_nucleotides
    skip
    expected = { 'A' => 0, 'T' => 0, 'C' => 0, 'G' => 0 }
    assert_equal expected, Nucleotide.from_dna('').histogram
  end

  def test_repetitive_sequence_has_only_guanosine
    skip
    expected = { 'A' => 0, 'T' => 0, 'C' => 0, 'G' => 8 }
    assert_equal expected, Nucleotide.from_dna('GGGGGGGG').histogram
  end

  def test_counts_all_nucleotides
    skip
    s = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'
    dna = Nucleotide.from_dna(s)
    expected = { 'A' => 20, 'T' => 21, 'G' => 17, 'C' => 12 }
    assert_equal expected, dna.histogram
  end

  def test_validates_dna
    skip
    assert_raises ArgumentError do
      Nucleotide.from_dna('JOHNNYAPPLESEED')
    end
  end
end
class Nucleotide
  attr_reader :string

  def initialize(string)
    fail ArgumentError if string =~ /[^ACTG]/

    @string = string
    @counts = Hash.new { |h, key| h[key] = string.count(key) }
  end

  def count(nucleotide)
    @counts[nucleotide]
  end

  def self.from_dna(string)
    Nucleotide.new(string)
  end

  def histogram
    count 'A'
    count 'T'
    count 'C'
    count 'G'

    @counts
  end
end

Community comments

Find this solution interesting? Ask the author a question to learn more.
Avatar of monkbroc

Any way to avoid the duplication of the list of valid nucleotides (ACTG)?

Avatar of remcopeereboom

@monkbroc Yeah, I should really put it in a data-structure of some sort. I could then also use it in the regexp. Thanks for the reminder. I'll try to remember to update it.

Avatar of remcopeereboom

I also should be using new instead of Nucleotide.new - give derived classes a chance to do their magic.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?