Avatar of daveyarwood

daveyarwood's solution

to Nucleotide Count in the D Track

Published at Jul 13 2018 · 0 comments
Instructions
Test suite
Solution

Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.

The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.

Here is an analogy:

  • twigs are to birds nests as
  • nucleotides are to DNA as
  • legos are to lego houses as
  • words are to sentences as...

Getting Started

Make sure you have read D page on exercism.io. This covers the basic information on setting up the development environment expected by the exercises.

Passing the Tests

Get the first test compiling, linking and passing by following the three rules of test-driven development. Create just enough structure by declaring namespaces, functions, classes, etc., to satisfy any compiler errors and get the test to fail. Then write just enough code to get the test to pass. Once you've done that, uncomment the next test by moving the following line past the next test.

static if (all_tests_enabled)

This may result in compile errors as new constructs may be invoked that you haven't yet declared or defined. Again, fix the compile errors minimally to get a failing test, then change the code minimally to pass the test, refactor your implementation for readability and expressiveness and then go on to the next test.

Try to use standard D facilities in preference to writing your own low-level algorithms or facilities by hand. DRefLanguage and DReference are references to the D language and D standard library.

Source

The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

nucleotide_count.d

module nucleotide_count;

import std.string;
import std.array;
import std.algorithm.sorting: sort;
import std.algorithm.comparison: equal;

unittest
{

// test associative array equality
bool aaEqual (const ulong[char] lhs, const ulong[char] rhs)
{
	auto lhs_pairs = lhs.byKeyValue.array;
	auto rhs_pairs = rhs.byKeyValue.array;
	lhs_pairs.sort!(q{a.key < b.key});
	rhs_pairs.sort!(q{a.key < b.key});

	return equal!("a.key == b.key && a.value == b.value")(lhs_pairs, rhs_pairs);
}

immutable int allTestsEnabled = 0;

// has_no_nucleotides
{
	const Counter dna = new Counter("");
	const ulong[char] expected = ['A': 0, 'T': 0, 'C': 0, 'G':0];

	auto actual = dna.nucleotideCounts();

	assert(aaEqual(expected, actual));
}

static if (allTestsEnabled)
{
// has_no_adenosine
{
	const Counter dna = new Counter("");

	assert(dna.countOne('A') == 0);
}

// repetitive_cytidine_gets_count
{
	const Counter dna = new Counter("CCCCC");

	assert(dna.countOne('C') == 5);
}

// repetitive_sequence_has_only_guanosine
{
	const Counter dna = new Counter("GGGGGGGG");
	const ulong[char] expected = ['A': 0, 'T': 0, 'C': 0, 'G': 8];

	const auto actual = dna.nucleotideCounts();

	assert(aaEqual(expected, actual));
}

// count_only_thymidine
{
	const Counter dna = new Counter("GGGGTAACCCGG");

	assert(dna.countOne('T') == 1);
}

// count_a_nucleotide_only_once
{

	const Counter dna = new Counter("GGTTGG");

	dna.countOne('T');

	assert(dna.countOne('T') == 2);
}

// validates_nucleotides
{
	import std.exception : assertThrown;

	const Counter dna = new Counter("GGTTGG");

	assertThrown(dna.countOne('X'));
}

// count_all_nucleotides)
{
	const Counter dna = new Counter("AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC");
	const ulong[char] expected = ['A': 20, 'T': 21, 'G': 17, 'C': 12 ];

	auto actual = dna.nucleotideCounts();

	assert(aaEqual(expected, actual));
}
}

}
module nucleotide_count;

import std.string;
import std.array;
import std.algorithm: canFind;
import std.algorithm.sorting: sort;
import std.algorithm.comparison: equal;

class Counter {
  static immutable VALID_NUCLEOTIDES = ['G', 'C', 'T', 'A'];

  string strand;
  ulong[char] counts;

  this(string s) {
    strand = s;

    ulong[char] cs;

    foreach(char c; VALID_NUCLEOTIDES) {
      cs[c] = 0;
    }

    foreach(char c; s) {
      cs[c] += 1;
    }

    counts = cs;
  }

  const(ulong[char]) nucleotideCounts() const {
    return counts;
  }

  ulong countOne(char nucleotide) const {
    if (!VALID_NUCLEOTIDES.canFind(nucleotide))
      throw new Exception("Invalid nucleotide.");

    return counts[nucleotide];
  }
}

unittest
{

// test associative array equality
bool aaEqual (const ulong[char] lhs, const ulong[char] rhs)
{
	auto lhs_pairs = lhs.byKeyValue.array;
	auto rhs_pairs = rhs.byKeyValue.array;
	lhs_pairs.sort!(q{a.key < b.key});
	rhs_pairs.sort!(q{a.key < b.key});

	return equal!("a.key == b.key && a.value == b.value")(lhs_pairs, rhs_pairs);
}

immutable int allTestsEnabled = 1;

// has_no_nucleotides
{
	const Counter dna = new Counter("");
	const ulong[char] expected = ['A': 0, 'T': 0, 'C': 0, 'G':0];

	auto actual = dna.nucleotideCounts();

	assert(aaEqual(expected, actual));
}

static if (allTestsEnabled)
{
// has_no_adenosine
{
	const Counter dna = new Counter("");

	assert(dna.countOne('A') == 0);
}

// repetitive_cytidine_gets_count
{
	const Counter dna = new Counter("CCCCC");

	assert(dna.countOne('C') == 5);
}

// repetitive_sequence_has_only_guanosine
{
	const Counter dna = new Counter("GGGGGGGG");
	const ulong[char] expected = ['A': 0, 'T': 0, 'C': 0, 'G': 8];

	const auto actual = dna.nucleotideCounts();

	assert(aaEqual(expected, actual));
}

// count_only_thymidine
{
	const Counter dna = new Counter("GGGGTAACCCGG");

	assert(dna.countOne('T') == 1);
}

// count_a_nucleotide_only_once
{

	const Counter dna = new Counter("GGTTGG");

	dna.countOne('T');

	assert(dna.countOne('T') == 2);
}

// validates_nucleotides
{
	import std.exception : assertThrown;

	const Counter dna = new Counter("GGTTGG");

	assertThrown(dna.countOne('X'));
}

// count_all_nucleotides)
{
	const Counter dna = new Counter("AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC");
	const ulong[char] expected = ['A': 20, 'T': 21, 'G': 17, 'C': 12 ];

	auto actual = dna.nucleotideCounts();

	assert(aaEqual(expected, actual));
}
}

}

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?