Exercism v3 launches on Sept 1st 2021. Learn more! 🚀🚀🚀
Avatar of rootulp

rootulp's solution

to Nucleotide Count in the Java Track

Published at Jul 13 2018 · 1 comment
Instructions
Test suite
Solution

Note:

This solution was written on an old version of Exercism. The tests below might not correspond to the solution code, and the exercise may have changed since this code was written.

Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.

The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.

Here is an analogy:

  • twigs are to birds nests as
  • nucleotides are to DNA as
  • legos are to lego houses as
  • words are to sentences as...

Java Tips

Since this exercise has difficulty 5 it doesn't come with any starter implementation. This is so that you get to practice creating classes and methods which is an important part of programming in Java. It does mean that when you first try to run the tests, they won't compile. They will give you an error similar to:

 path-to-exercism-dir\exercism\java\name-of-exercise\src\test\java\ExerciseClassNameTest.java:14: error: cannot find symbol
        ExerciseClassName exerciseClassName = new ExerciseClassName();
        ^
 symbol:   class ExerciseClassName
 location: class ExerciseClassNameTest

This error occurs because the test refers to a class that hasn't been created yet (ExerciseClassName). To resolve the error you need to add a file matching the class name in the error to the src/main/java directory. For example, for the error above you would add a file called ExerciseClassName.java.

When you try to run the tests again you will get slightly different errors. You might get an error similar to:

  constructor ExerciseClassName in class ExerciseClassName cannot be applied to given types;
        ExerciseClassName exerciseClassName = new ExerciseClassName("some argument");
                                              ^
  required: no arguments
  found: String
  reason: actual and formal argument lists differ in length

This error means that you need to add a constructor to your new class. If you don't add a constructor, Java will add a default one for you. This default constructor takes no arguments. So if the tests expect your class to have a constructor which takes arguments, then you need to create this constructor yourself. In the example above you could add:

ExerciseClassName(String input) {

}

That should make the error go away, though you might need to add some more code to your constructor to make the test pass!

You might also get an error similar to:

  error: cannot find symbol
        assertEquals(expectedOutput, exerciseClassName.someMethod());
                                                       ^
  symbol:   method someMethod()
  location: variable exerciseClassName of type ExerciseClassName

This error means that you need to add a method called someMethod to your new class. In the example above you would add:

String someMethod() {
  return "";
}

Make sure the return type matches what the test is expecting. You can find out which return type it should have by looking at the type of object it's being compared to in the tests. Or you could set your method to return some random type (e.g. void), and run the tests again. The new error should tell you which type it's expecting.

After having resolved these errors you should be ready to start making the tests pass!

Running the tests

You can run all the tests for an exercise by entering

$ gradle test

in your terminal.

Source

The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

NucleotideCounterTest.java

import org.junit.Ignore;
import org.junit.Test;
import org.junit.Rule;
import org.junit.rules.ExpectedException;

import java.util.Map;

import static org.hamcrest.Matchers.*;
import static org.junit.Assert.*;

public class NucleotideCounterTest {

    @Rule
    public ExpectedException expectedException = ExpectedException.none();

    @Test
    public void testEmptyDnaStringHasNoNucleotides() {
        NucleotideCounter nucleotideCounter = new NucleotideCounter("");
        Map<Character, Integer> counts = nucleotideCounter.nucleotideCounts();
        assertThat(counts, allOf(
                hasEntry('A', 0),
                hasEntry('C', 0),
                hasEntry('G', 0),
                hasEntry('T', 0)
        ));
    }

    @Ignore("Remove to run test")
    @Test
    public void testDnaStringHasOneNucleotide() {
        NucleotideCounter nucleotideCounter = new NucleotideCounter("G");
        Map<Character, Integer> counts = nucleotideCounter.nucleotideCounts();
        assertThat(counts, allOf(
                hasEntry('A', 0),
                hasEntry('C', 0),
                hasEntry('G', 1),
                hasEntry('T', 0)
        ));
    }

    @Ignore("Remove to run test")
    @Test
    public void testRepetitiveSequenceWithOnlyGuanine() {
        NucleotideCounter nucleotideCounter = new NucleotideCounter("GGGGGGG");
        Map<Character, Integer> counts = nucleotideCounter.nucleotideCounts();
        assertThat(counts, allOf(
                hasEntry('A', 0),
                hasEntry('C', 0),
                hasEntry('G', 7),
                hasEntry('T', 0)
        ));
    }

    @Ignore("Remove to run test")
    @Test
    public void testDnaStringHasMultipleNucleotide() {
        NucleotideCounter nucleotideCounter
            = new NucleotideCounter("AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC");
        Map<Character, Integer> counts = nucleotideCounter.nucleotideCounts();
        assertThat(counts, allOf(
                hasEntry('A', 20),
                hasEntry('C', 12),
                hasEntry('G', 17),
                hasEntry('T', 21)
        ));
    }

    @Ignore("Remove to run test")
    @Test
    public void testDnaStringHasInvalidNucleotides() {
        expectedException.expect(IllegalArgumentException.class);
        NucleotideCounter nucleotideCounter = new NucleotideCounter("AGXXACT");
    }
}
import java.util.Map;
import java.util.HashMap;

public final class DNA {

  private final String chain;
  private static final String VALID_NUCLEOTIDES = "ACGT";

  public DNA(String chain) {
    this.chain = chain;
  }

  public int count(char nucleotide) {
    if (invalid(nucleotide)) { throw new IllegalArgumentException(); }

    try {
      return nucleotideCounts().get(nucleotide);
    } catch (NullPointerException e) {
      return 0;
    }
  }

  private static boolean invalid(char nucleotide) {
    return VALID_NUCLEOTIDES.indexOf(nucleotide) == -1;
  }

  public Map<Character, Integer> nucleotideCounts() {
    Map<Character, Integer> counts = emptyCounts();
    for (char c : chain.toCharArray()) {
      counts.put(c, counts.get(c) + 1);
    }
    return counts;
  }

  private static Map<Character, Integer> emptyCounts() {
    Map<Character, Integer> emptyCounts = new HashMap<Character, Integer>();
    for (char c : VALID_NUCLEOTIDES.toCharArray()) {
      emptyCounts.put(c, 0);
    }
    return emptyCounts;
  }
}

Community comments

Find this solution interesting? Ask the author a question to learn more.
Avatar of jtigger

One of the points of this exercise is to consider the computation cost of an operation. Here, count().

The human genome has 3 billion base pairs. How long would it take your solution to count them? Now, perhaps it's okay to wait the one time for that result, but if someone called count() again on the same instance of DNA (i.e. a result that wouldn't change), how long would that take?

Is line 18 necessary? In general using checks instead of exceptions for flow control is going to result in clearer code. Also, since you prime the map with zeros already and you protect against unknown/invalid nucleotides, will a NullPointerException ever get thrown from line 17? Even more — what if nucleotideCounts() threw a NullPointerException for some totally different reason... line 18 would be swallowing that defect and quietly returning a zero... potentially very incorrect results.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?