Avatar of agbell

agbell's solution

to Nucleotide Count in the Haskell Track

Published at Jul 13 2018 · 0 comments
Instructions
Test suite
Solution

Note:

This solution was written on an old version of Exercism. The tests below might not correspond to the solution code, and the exercise may have changed since this code was written.

Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.

The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.

Here is an analogy:

  • twigs are to birds nests as
  • nucleotides are to DNA as
  • legos are to lego houses as
  • words are to sentences as...

Getting Started

For installation and learning resources, refer to the exercism help page.

Running the tests

To run the test suite, execute the following command:

stack test

If you get an error message like this...

No .cabal file found in directory

You are probably running an old stack version and need to upgrade it.

Otherwise, if you get an error message like this...

No compiler found, expected minor version match with...
Try running "stack setup" to install the correct GHC...

Just do as it says and it will download and install the correct compiler version:

stack setup

Running GHCi

If you want to play with your solution in GHCi, just run the command:

stack ghci

Feedback, Issues, Pull Requests

The exercism/haskell repository on GitHub is the home for all of the Haskell exercises.

If you have feedback about an exercise, or want to help implementing a new one, head over there and create an issue. We'll do our best to help you!

Source

The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

Tests.hs

{-# OPTIONS_GHC -fno-warn-type-defaults #-}

import Data.Either       (isLeft)
import Data.Map          (fromList)
import Test.Hspec        (Spec, describe, it, shouldBe, shouldSatisfy)
import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)

import DNA (nucleotideCounts)

main :: IO ()
main = hspecWith defaultConfig {configFastFail = True} specs

specs :: Spec
specs = do

          let x `matchesMap` y = x `shouldBe` (Right . fromList) y

          describe "nucleotideCounts" $ do

            it "empty dna strand has no nucleotides" $
              nucleotideCounts "" `matchesMap` [ ('A', 0)
                                               , ('C', 0)
                                               , ('G', 0)
                                               , ('T', 0) ]

            it "can count one nucleotide in single-character input" $
              nucleotideCounts "G" `matchesMap` [ ('A', 0)
                                                , ('C', 0)
                                                , ('G', 1)
                                                , ('T', 0) ]

            it "repetitive-sequence-has-only-guanosine" $
              nucleotideCounts "GGGGGGGG" `matchesMap` [ ('A', 0)
                                                       , ('C', 0)
                                                       , ('G', 8)
                                                       , ('T', 0) ]

            it "counts all nucleotides" $
              nucleotideCounts "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
              `matchesMap` [ ('A', 20)
                           , ('C', 12)
                           , ('G', 17)
                           , ('T', 21) ]

            it "validates strand" $
              nucleotideCounts "AGXXACT" `shouldSatisfy` isLeft
module DNA (count, nucleotideCounts)
where

import qualified Data.Map.Strict as Map
import Data.List

type Count = Map.Map Char Int

count :: Char -> String -> Int
count c cs = Map.findWithDefault err c $ nucleotideCounts cs
  where err = error $ "invalid nucleotide " ++ show c

nucleotideCounts :: String -> Count
nucleotideCounts = foldl' (accumulate 1) empty

empty :: Count
empty = foldl' (accumulate 0) Map.empty "ACGT"

accumulate :: Int -> Count -> Char -> Count
accumulate i m c = Map.insertWith (+) c i m

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?