Avatar of Picani

Picani's solution

to Nucleotide Count in the Common Lisp Track

Published at Jan 07 2020 · 0 comments
Test suite

Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.

The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.

Here is an analogy:

  • twigs are to birds nests as
  • nucleotides are to DNA as
  • legos are to lego houses as
  • words are to sentences as...


Check out Installing Common Lisp for instructions to get started or take a look at the guides available in the track's side bar.


While Common Lisp doesn't care about indentation and layout of code, nor whether you use spaces or tabs, this is an important consideration for submissions to exercism.io. Excercism.io's code widget cannot handle mixing of tab and space characters well so using only spaces is recommended to make the code more readable to the human reviewers. Please review your editors settings on how to accomplish this. Below are instructions for popular editors for Common Lisp.


Use the following commands to ensure VIM uses only spaces for indentation:

:set tabstop=2
:set shiftwidth=2
:set expandtab

(or as a oneliner :set tabstop=2 shiftwidth=2 expandtab). This can be added to your ~/.vimrc file to use it all the time.


Emacs is very well suited for editing Common Lisp and has many powerful add-on packages available. The only thing that one needs to do with a stock emacs to make it work well with exercism.io is to evaluate the following code:

(setq-default indent-tabs-mode nil)

This can be placed in your ~/.emacs (or ~/.emacs.d/init.el) in order to have it set whenever Emacs is launched.

One suggested add-on for Emacs and Common Lisp is SLIME which offers tight integration with the REPL; making iterative coding and testing very easy.


The Calculating DNA Nucleotides_problem at Rosalind http://rosalind.info/problems/dna/

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.


(ql:quickload "lisp-unit")
#-xlisp-test (load "nucleotide-count")

(defpackage #:nucleotide-count-test
  (:use #:common-lisp #:lisp-unit))

(in-package #:nucleotide-count-test)

(defun make-hash (kvs)
   #'(lambda (h kv) (setf (gethash (first kv) h) (second kv)) h)
   :initial-value (make-hash-table)))

(define-test empty-dna-strand-has-no-adenine
  (assert-equal 0 (dna:dna-count #\A "")))

(define-test empty-dna-strand-has-no-nucleotides
  (assert-equalp (make-hash '((#\A 0) (#\T 0) (#\C 0) (#\G 0)))
      (dna:nucleotide-counts "")))

(define-test repetitive-cytosine-gets-counted
  (assert-equal 5 (dna:dna-count #\C "CCCCC")))

(define-test repetitive-sequence-has-only-guanine
  (assert-equalp (make-hash '((#\A 0) (#\T 0) (#\C 0) (#\G 8)))
      (dna:nucleotide-counts "GGGGGGGG")))

(define-test counts-only-thymine
  (assert-equal 1 (dna:dna-count #\T "GGGGGTAACCCGG")))

(define-test validates-nucleotides
  (assert-error 'dna:invalid-nucleotide (dna:dna-count #\X "GACT")))

(define-test counts-all-nucleotides
  (assert-equalp (make-hash '((#\A 20) (#\T 21) (#\G 17) (#\C 12)))

(let ((*print-errors* t)
      (*print-failures* t))
  (run-tests :all :nucleotide-count-test))
(in-package #:cl-user)
(defpackage #:dna
  (:use #:common-lisp)
  (:export #:dna-count #:nucleotide-counts #:invalid-nucleotide))

(in-package #:dna)

(define-condition invalid-nucleotide (error) ())

(defun validate-nt (nt)
  "If nt is A, T, C, or G (case insensitive) does nothing. Else,
.signals an invalid-nucleotide error."
  (unless (find nt "ATCG" :test #'char-equal)
    (error 'invalid-nucleotide)))

(defun dna-count (nt seq)
  "Return the number of that nucleotide in the sequence.
   nt is a char and seq is a string."
  (validate-nt nt)
  (length (remove-if
           #'(lambda (n) (not (char-equal n nt)))

(defun nucleotide-counts (seq)
  "Count the number of each nucleotides in the sequence.
   seq is a string."
  (let ((cnt (make-hash-table)))
    (setf (gethash #\A cnt) 0
          (gethash #\T cnt) 0
          (gethash #\C cnt) 0
          (gethash #\G cnt) 0)
    (dolist (nt (coerce seq 'list))
      (validate-nt nt)
      (setf (gethash nt cnt) (1+ (gethash nt cnt))))

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?