🎉 Exercism Research is now launched. Help Exercism, help science and have some fun at research.exercism.io 🎉 # angelikatyborska's solution

## to Nucleotide Count in the Ruby Track

Published at Jul 13 2018 · 1 comment
Instructions
Test suite
Solution

Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.

The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.

Here is an analogy:

• twigs are to birds nests as
• nucleotides are to DNA as
• legos are to lego houses as
• words are to sentences as...

For installation and learning resources, refer to the exercism help page.

For running the tests provided, you will need the Minitest gem. Open a terminal window and run the following command to install minitest:

``````gem install minitest
``````

If you would like color output, you can `require 'minitest/pride'` in the test file, or note the alternative instruction, below, for running the test file.

Run the tests from the exercise directory using the following command:

``````ruby nucleotide_count_test.rb
``````

To include color from the command line:

``````ruby -r minitest/pride nucleotide_count_test.rb
``````

## Source

The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/

## Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

### nucleotide_count_test.rb

``````require 'minitest/autorun'
require_relative 'nucleotide_count'

class NucleotideTest < Minitest::Test
def test_empty_dna_strand_has_no_adenosine
assert_equal 0, Nucleotide.from_dna('').count('A')
end

def test_repetitive_cytidine_gets_counted
skip
assert_equal 5, Nucleotide.from_dna('CCCCC').count('C')
end

def test_counts_only_thymidine
skip
assert_equal 1, Nucleotide.from_dna('GGGGGTAACCCGG').count('T')
end

def test_counts_a_nucleotide_only_once
skip
dna = Nucleotide.from_dna('CGATTGGG')
dna.count('T')
dna.count('T')
assert_equal 2, dna.count('T')
end

def test_empty_dna_strand_has_no_nucleotides
skip
expected = { 'A' => 0, 'T' => 0, 'C' => 0, 'G' => 0 }
assert_equal expected, Nucleotide.from_dna('').histogram
end

def test_repetitive_sequence_has_only_guanosine
skip
expected = { 'A' => 0, 'T' => 0, 'C' => 0, 'G' => 8 }
assert_equal expected, Nucleotide.from_dna('GGGGGGGG').histogram
end

def test_counts_all_nucleotides
skip
s = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'
dna = Nucleotide.from_dna(s)
expected = { 'A' => 20, 'T' => 21, 'G' => 17, 'C' => 12 }
assert_equal expected, dna.histogram
end

def test_validates_dna
skip
assert_raises ArgumentError do
Nucleotide.from_dna('JOHNNYAPPLESEED')
end
end
end``````
``````class Nucleotide
DNA_NUCLEOTIDES = %w(A C G T)
REGEXP = Regexp.new('\A(' + DNA_NUCLEOTIDES.join('|') + ')*\z')

def initialize(dna)
if valid_dna?(dna)
@dna = dna
else
raise ArgumentError, "#{ dna } is not a valid DNA string"
end
end

def self.from_dna(dna)
new(dna)
end

def histogram
@histogram ||= @dna.chars.each_with_object(empty_histogram) do |nucleotide, histogram|
histogram[nucleotide] += 1
end
end

def count(nucleotide)
histogram[nucleotide]
end

private

def valid_dna?(string)
string =~ REGEXP
end

def empty_histogram
Hash[DNA_NUCLEOTIDES.collect { |nucleotide| [nucleotide, 0] }]
end
end``````

## Community comments

Find this solution interesting? Ask the author a question to learn more. Solution Author
commented about 5 years ago

My first instinct was to use string.chars.all? { |char| DNA_NUCLEOTIDES.include?(char) }

to validate a DNA string, but I stopped to think about speed. I didn't know if a regular expression would be faster, so I wrote this little benchmark. A regular expression is faster in all the cases I have checked: for valid and for random strings, for a few of really long strings and for many short strings. Sometimes it's 3 times faster, sometimes it's over 100 times faster.

### What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

• What compromises have been made?
• Are there new concepts here that you could read more about to improve your understanding?