Avatar of rabuf

rabuf's solution

to Nucleotide Count in the Common Lisp Track

Published at Jan 20 2020 · 0 comments
Instructions
Test suite
Solution

Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.

The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.

Here is an analogy:

  • twigs are to birds nests as
  • nucleotides are to DNA as
  • legos are to lego houses as
  • words are to sentences as...

Setup

Check out Installing Common Lisp for instructions to get started or take a look at the guides available in the track's side bar.

Formatting

While Common Lisp doesn't care about indentation and layout of code, nor whether you use spaces or tabs, this is an important consideration for submissions to exercism.io. Excercism.io's code widget cannot handle mixing of tab and space characters well so using only spaces is recommended to make the code more readable to the human reviewers. Please review your editors settings on how to accomplish this. Below are instructions for popular editors for Common Lisp.

VIM

Use the following commands to ensure VIM uses only spaces for indentation:

:set tabstop=2
:set shiftwidth=2
:set expandtab

(or as a oneliner :set tabstop=2 shiftwidth=2 expandtab). This can be added to your ~/.vimrc file to use it all the time.

Emacs

Emacs is very well suited for editing Common Lisp and has many powerful add-on packages available. The only thing that one needs to do with a stock emacs to make it work well with exercism.io is to evaluate the following code:

(setq-default indent-tabs-mode nil)

This can be placed in your ~/.emacs (or ~/.emacs.d/init.el) in order to have it set whenever Emacs is launched.

One suggested add-on for Emacs and Common Lisp is SLIME which offers tight integration with the REPL; making iterative coding and testing very easy.

Source

The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

nucleotide-count-test.lisp

(ql:quickload "lisp-unit")
#-xlisp-test (load "nucleotide-count")

(defpackage #:nucleotide-count-test
  (:use #:common-lisp #:lisp-unit))

(in-package #:nucleotide-count-test)

(defun make-hash (kvs)
  (reduce
   #'(lambda (h kv) (setf (gethash (first kv) h) (second kv)) h)
   kvs
   :initial-value (make-hash-table)))

(define-test empty-dna-strand-has-no-adenine
  (assert-equal 0 (dna:dna-count #\A "")))

(define-test empty-dna-strand-has-no-nucleotides
  (assert-equalp (make-hash '((#\A 0) (#\T 0) (#\C 0) (#\G 0)))
      (dna:nucleotide-counts "")))

(define-test repetitive-cytosine-gets-counted
  (assert-equal 5 (dna:dna-count #\C "CCCCC")))

(define-test repetitive-sequence-has-only-guanine
  (assert-equalp (make-hash '((#\A 0) (#\T 0) (#\C 0) (#\G 8)))
      (dna:nucleotide-counts "GGGGGGGG")))

(define-test counts-only-thymine
  (assert-equal 1 (dna:dna-count #\T "GGGGGTAACCCGG")))

(define-test validates-nucleotides
  (assert-error 'dna:invalid-nucleotide (dna:dna-count #\X "GACT")))

(define-test counts-all-nucleotides
  (assert-equalp (make-hash '((#\A 20) (#\T 21) (#\G 17) (#\C 12)))
      (dna:nucleotide-counts
       "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC")))

#-xlisp-test
(let ((*print-errors* t)
      (*print-failures* t))
  (run-tests :all :nucleotide-count-test))
(in-package #:cl-user)
(defpackage #:dna
  (:use #:common-lisp)
  (:export #:dna-count #:nucleotide-counts #:invalid-nucleotide))

(in-package #:dna)

(define-condition invalid-nucleotide (error) ())

(defun dna-count (nucleotide sequence)
  (unless (member nucleotide '(#\A #\C #\G #\T))
    (error 'invalid-nucleotide))
  (count nucleotide sequence))

(defun nucleotide-counts (sequence)
  (let ((counts (make-hash-table)))
    (loop for c across "ACGT"
       do (setf (gethash c counts) (dna-count c sequence)))
    counts))

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?