Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.
The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.
Here is an analogy:
For installation and learning resources, refer to the exercism help page.
For running the tests provided, you will need the Minitest gem. Open a terminal window and run the following command to install minitest:
gem install minitest
If you would like color output, you can require 'minitest/pride'
in
the test file, or note the alternative instruction, below, for running
the test file.
Run the tests from the exercise directory using the following command:
ruby nucleotide_count_test.rb
To include color from the command line:
ruby -r minitest/pride nucleotide_count_test.rb
The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/
It's possible to submit an incomplete solution so you can see how others have completed the exercise.
require 'minitest/autorun'
require_relative 'nucleotide_count'
class NucleotideTest < Minitest::Test
def test_empty_dna_strand_has_no_adenosine
assert_equal 0, Nucleotide.from_dna('').count('A')
end
def test_repetitive_cytidine_gets_counted
skip
assert_equal 5, Nucleotide.from_dna('CCCCC').count('C')
end
def test_counts_only_thymidine
skip
assert_equal 1, Nucleotide.from_dna('GGGGGTAACCCGG').count('T')
end
def test_counts_a_nucleotide_only_once
skip
dna = Nucleotide.from_dna('CGATTGGG')
dna.count('T')
dna.count('T')
assert_equal 2, dna.count('T')
end
def test_empty_dna_strand_has_no_nucleotides
skip
expected = { 'A' => 0, 'T' => 0, 'C' => 0, 'G' => 0 }
assert_equal expected, Nucleotide.from_dna('').histogram
end
def test_repetitive_sequence_has_only_guanosine
skip
expected = { 'A' => 0, 'T' => 0, 'C' => 0, 'G' => 8 }
assert_equal expected, Nucleotide.from_dna('GGGGGGGG').histogram
end
def test_counts_all_nucleotides
skip
s = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'
dna = Nucleotide.from_dna(s)
expected = { 'A' => 20, 'T' => 21, 'G' => 17, 'C' => 12 }
assert_equal expected, dna.histogram
end
def test_validates_dna
skip
assert_raises ArgumentError do
Nucleotide.from_dna('JOHNNYAPPLESEED')
end
end
end
class Nucleotide
DNA_NUCLEOTIDES = %w(A C G T)
REGEXP = Regexp.new('\A(' + DNA_NUCLEOTIDES.join('|') + ')*\z')
def initialize(dna)
if valid_dna?(dna)
@dna = dna
else
raise ArgumentError, "#{ dna } is not a valid DNA string"
end
end
def self.from_dna(dna)
new(dna)
end
def histogram
@histogram ||= @dna.chars.each_with_object(empty_histogram) do |nucleotide, histogram|
histogram[nucleotide] += 1
end
end
def count(nucleotide)
histogram[nucleotide]
end
private
def valid_dna?(string)
string =~ REGEXP
end
def empty_histogram
Hash[DNA_NUCLEOTIDES.collect { |nucleotide| [nucleotide, 0] }]
end
end
A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.
Here are some questions to help you reflect on this solution and learn the most from it.
Level up your programming skills with 3,450 exercises across 52 languages, and insightful discussion with our volunteer team of welcoming mentors. Exercism is 100% free forever.
Sign up Learn More
Community comments
My first instinct was to use string.chars.all? { |char| DNA_NUCLEOTIDES.include?(char) }
to validate a DNA string, but I stopped to think about speed. I didn't know if a regular expression would be faster, so I wrote this little benchmark. A regular expression is faster in all the cases I have checked: for valid and for random strings, for a few of really long strings and for many short strings. Sometimes it's 3 times faster, sometimes it's over 100 times faster.