Avatar of artemkorsakov

artemkorsakov's solution

to Nucleotide Count in the Go Track

Published at Feb 25 2019 · 0 comments
Instructions
Test suite
Solution

Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.

The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.

Here is an analogy:

  • twigs are to birds nests as
  • nucleotides are to DNA as
  • legos are to lego houses as
  • words are to sentences as...

Implementation

You should define a custom type 'DNA' with a function 'Counts' that outputs two values:

  • a frequency count for the given DNA strand
  • an error (if there are invalid nucleotides)

Which is a good type for a DNA strand ?

Which is the best Go types to represent the output values ?

Take a look at the test cases to get a hint about what could be the possible inputs.

note about the tests

You may be wondering about the cases_test.go file. We explain it in the leap exercise.

Running the tests

To run the tests run the command go test from within the exercise directory.

If the test suite contains benchmarks, you can run these with the --bench and --benchmem flags:

go test -v --bench . --benchmem

Keep in mind that each reviewer will run benchmarks on a different machine, with different specs, so the results from these benchmark tests may vary.

Further information

For more detailed information about the Go track, including how to get help if you're having trouble, please visit the exercism.io Go language page.

Source

The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

cases_test.go

package dna

// Source: exercism/problem-specifications
// Commit: 879a096 nucleotide-count: Apply new "input" policy
// Problem Specifications Version: 1.3.0

// count all nucleotides in a strand
var testCases = []struct {
	description   string
	strand        string
	expected      Histogram
	errorExpected bool
}{
	{
		description: "empty strand",
		strand:      "",
		expected:    Histogram{'A': 0, 'C': 0, 'G': 0, 'T': 0},
	},
	{
		description: "can count one nucleotide in single-character input",
		strand:      "G",
		expected:    Histogram{'A': 0, 'C': 0, 'G': 1, 'T': 0},
	},
	{
		description: "strand with repeated nucleotide",
		strand:      "GGGGGGG",
		expected:    Histogram{'A': 0, 'C': 0, 'G': 7, 'T': 0},
	},
	{
		description: "strand with multiple nucleotides",
		strand:      "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC",
		expected:    Histogram{'A': 20, 'C': 12, 'G': 17, 'T': 21},
	},
	{
		description:   "strand with invalid nucleotides",
		strand:        "AGXXACT",
		errorExpected: true,
	},
}

nucleotide_count_test.go

package dna

import (
	"reflect"
	"testing"
)

func TestCounts(t *testing.T) {
	for _, tc := range testCases {
		dna := DNA(tc.strand)
		s, err := dna.Counts()
		if tc.errorExpected {
			if err == nil {
				t.Fatalf("FAIL: %s\nCounts(%q)\nExpected error\nActual: %#v",
					tc.description, tc.strand, s)
			}
		} else if err != nil {
			t.Fatalf("FAIL: %s\nCounts(%q)\nExpected: %#v\nGot error: %q",
				tc.description, tc.strand, tc.expected, err)
		} else if !reflect.DeepEqual(s, tc.expected) {
			t.Fatalf("FAIL: %s\nCounts(%q)\nExpected: %#v\nActual: %#v",
				tc.description, tc.strand, tc.expected, s)
		}
		t.Logf("PASS: %s", tc.description)
	}
}
package dna

import (
	"errors"
	"strings"
)

// Histogram is a mapping from nucleotide to its count in given DNA.
type Histogram map[rune]int

// DNA is a list of nucleotides.
type DNA string

// Counts generates a histogram of valid nucleotides in the given DNA.
// Returns an error if d contains an invalid nucleotide.
func (d DNA) Counts() (Histogram, error) {
	var h Histogram
	h = make(map[rune]int)
	sumA := strings.Count(string(d), "A")
	h['A'] = sumA
	sumC := strings.Count(string(d), "C")
	h['C'] = sumC
	sumG := strings.Count(string(d), "G")
	h['G'] = sumG
	sumT := strings.Count(string(d), "T")
	h['T'] = sumT
	if sumA+sumC+sumG+sumT == len(d) {
		return h, nil
	}
	
	return h, errors.New("invalid nucleotide")
}

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?