Avatar of w1zeman1p

w1zeman1p's solution

to Nucleotide Count in the Elixir Track

Published at Jul 13 2018 · 5 comments
Instructions
Test suite
Solution

Note:

This solution was written on an old version of Exercism. The tests below might not correspond to the solution code, and the exercise may have changed since this code was written.

Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.

The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.

Here is an analogy:

  • twigs are to birds nests as
  • nucleotides are to DNA as
  • legos are to lego houses as
  • words are to sentences as...

Running tests

Execute the tests with:

$ elixir nucleotide_count_test.exs

Pending tests

In the test suites, all but the first test have been skipped.

Once you get a test passing, you can unskip the next one by commenting out the relevant @tag :pending with a # symbol.

For example:

# @tag :pending
test "shouting" do
  assert Bob.hey("WATCH OUT!") == "Whoa, chill out!"
end

Or, you can enable all the tests by commenting out the ExUnit.configure line in the test suite.

# ExUnit.configure exclude: :pending, trace: true

For more detailed information about the Elixir track, please see the help page.

Source

The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

nucleotide_count_test.exs

if !System.get_env("EXERCISM_TEST_EXAMPLES") do
  Code.load_file("nucleotide_count.exs", __DIR__)
end

ExUnit.start()
ExUnit.configure(exclude: :pending, trace: true)

defmodule NucleotideCountTest do
  use ExUnit.Case

  # @tag :pending
  test "empty dna string has no adenine" do
    assert NucleotideCount.count('', ?A) == 0
  end

  @tag :pending
  test "repetitive cytosine gets counted" do
    assert NucleotideCount.count('CCCCC', ?C) == 5
  end

  @tag :pending
  test "counts only thymine" do
    assert NucleotideCount.count('GGGGGTAACCCGG', ?T) == 1
  end

  @tag :pending
  test "empty dna string has no nucleotides" do
    expected = %{?A => 0, ?T => 0, ?C => 0, ?G => 0}
    assert NucleotideCount.histogram('') == expected
  end

  @tag :pending
  test "repetitive sequence has only guanine" do
    expected = %{?A => 0, ?T => 0, ?C => 0, ?G => 8}
    assert NucleotideCount.histogram('GGGGGGGG') == expected
  end

  @tag :pending
  test "counts all nucleotides" do
    s = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'
    expected = %{?A => 20, ?T => 21, ?C => 12, ?G => 17}
    assert NucleotideCount.histogram(s) == expected
  end
end
defmodule DNA do
  @nucleotides 'ACTG'

  @doc """
  Counts individual nucleotides in a DNA strand.

  ## Examples

  iex> DNA.count('AATAA', ?A)
  4

  iex> DNA.count('AATAA', ?T)
  1
  """
  @spec validate_one(char) :: char
  def validate_one(n) when n in @nucleotides, do: n
  def validate_one(n) when not n in @nucleotides, do: raise ArgumentError

  @spec count([char], char) :: non_neg_integer
  def count(strand, nucleotide) do
    nucleotide |> validate_one
    Enum.count(strand, fn(n) -> n |> validate_one == nucleotide end)
  end

  @doc """
  Returns a summary of counts by nucleotide.

  ## Examples

  iex> DNA.histogram('AATAA')
  %{?A => 4, ?T => 1, ?C => 0, ?G => 0}
  """
  @spec histogram([char]) :: map
  def histogram(strand) do
    @nucleotides
    |> Enum.reduce(%{}, fn(n, hist) ->
       Map.put(hist, n, count(strand, n))
    end)
  end
end

Community comments

Find this solution interesting? Ask the author a question to learn more.
Avatar of TFarla

Using the guard clause to pattern match is a step in the right direction. But throwing an ArgumentError is not what we are used to doing in Elixir. I get it and in other languages it's great to do this. However, in Elixir we prefer to let it crash.

Take for example a GenServer. When we start a GenServer with GenServer.start_link it returns {:ok, pid}. If the creation fails it returns {:error, reason}. This allows us to pattern match on the value and make powerful assertions based on the return value.

Here is a blog post about error handling.

Avatar of w1zeman1p

So i tried to let it just crash, but the tests failed :) they failed with a function clause not found or met error rather than an argument error. Perhaps we should fix these tests? I believe you, but the tests don't pass. haha.

Avatar of TFarla

@w1zeman1p on what case did it fail? Perhaps there is a nifty way to use pattern matching :)

Avatar of w1zeman1p

If i remove line 17 the tests fail.

Avatar of TFarla

@w1zeman1p Execuse me, I didn't look at the testsuite and some tests do expect an ArgumentError. In that case I would remove the guard clause when not in @nucleotides in the second validate_one function.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?