🎉 Exercism Research is now launched. Help Exercism, help science and have some fun at research.exercism.io 🎉
Avatar of katrinleinweber

katrinleinweber's solution

to Protein Translation in the Python Track

Published at Aug 24 2019 · 0 comments
Test suite


This exercise has changed since this solution was written.

Translate RNA sequences into proteins.

RNA can be broken into three nucleotide sequences called codons, and then translated to a polypeptide like so:

RNA: "AUGUUUUCU" => translates to

Codons: "AUG", "UUU", "UCU" => which become a polypeptide with the following sequence =>

Protein: "Methionine", "Phenylalanine", "Serine"

There are 64 codons which in turn correspond to 20 amino acids; however, all of the codon sequences and resulting amino acids are not important in this exercise. If it works for one codon, the program should work for all of them. However, feel free to expand the list in the test suite to include them all.

There are also three terminating codons (also known as 'STOP' codons); if any of these codons are encountered (by the ribosome), all translation ends and the protein is terminated.

All subsequent codons after are ignored, like this:


Codons: "AUG", "UUU", "UCU", "UAA", "AUG" =>

Protein: "Methionine", "Phenylalanine", "Serine"

Note the stop codon "UAA" terminates the translation and the final methionine is not translated into the protein sequence.

Below are the codons and resulting Amino Acids needed for the exercise.

Codon Protein
AUG Methionine
UUU, UUC Phenylalanine
UUA, UUG Leucine
UAU, UAC Tyrosine
UGU, UGC Cysteine
UGG Tryptophan

Learn more about protein translation on Wikipedia

Exception messages

Sometimes it is necessary to raise an exception. When you do this, you should include a meaningful error message to indicate what the source of the error is. This makes your code more readable and helps significantly with debugging. Not every exercise will require you to raise an exception, but for those that do, the tests will only pass if you include a message.

To raise a message with an exception, just write it as an argument to the exception type. For example, instead of raise Exception, you should write:

raise Exception("Meaningful message indicating the source of the error")

Running the tests

To run the tests, run the appropriate command below (why they are different):

  • Python 2.7: py.test protein_translation_test.py
  • Python 3.4+: pytest protein_translation_test.py

Alternatively, you can tell Python to run the pytest module (allowing the same command to be used regardless of Python version): python -m pytest protein_translation_test.py

Common pytest options

  • -v : enable verbose output
  • -x : stop running tests on first failure
  • --ff : run failures from previous test before running other test cases

For other options, see python -m pytest -h

Submitting Exercises

Note that, when trying to submit an exercise, make sure the solution is in the $EXERCISM_WORKSPACE/python/protein-translation directory.

You can find your Exercism workspace by running exercism debug and looking for the line that starts with Workspace.

For more detailed information about running tests, code style and linting, please see Running the Tests.


Tyler Long

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.


import unittest

from protein_translation import proteins

# Tests adapted from `problem-specifications//canonical-data.json` @ v1.1.1

class ProteinTranslationTest(unittest.TestCase):

    def test_AUG_translates_to_methionine(self):
        self.assertEqual(proteins('AUG'), ['Methionine'])

    def test_identifies_Phenylalanine_codons(self):
        for codon in ['UUU', 'UUC']:
            self.assertEqual(proteins(codon), ['Phenylalanine'])

    def test_identifies_Leucine_codons(self):
        for codon in ['UUA', 'UUG']:
            self.assertEqual(proteins(codon), ['Leucine'])

    def test_identifies_Serine_codons(self):
        for codon in ['UCU', 'UCC', 'UCA', 'UCG']:
            self.assertEqual(proteins(codon), ['Serine'])

    def test_identifies_Tyrosine_codons(self):
        for codon in ['UAU', 'UAC']:
            self.assertEqual(proteins(codon), ['Tyrosine'])

    def test_identifies_Cysteine_codons(self):
        for codon in ['UGU', 'UGC']:
            self.assertEqual(proteins(codon), ['Cysteine'])

    def test_identifies_Tryptophan_codons(self):
        self.assertEqual(proteins('UGG'), ['Tryptophan'])

    def test_identifies_stop_codons(self):
        for codon in ['UAA', 'UAG', 'UGA']:
            self.assertEqual(proteins(codon), [])

    def test_translates_rna_strand_into_correct_protein_list(self):
        strand = 'AUGUUUUGG'
        expected = ['Methionine', 'Phenylalanine', 'Tryptophan']
        self.assertEqual(proteins(strand), expected)

    def test_stops_translation_if_stop_codon_at_beginning_of_sequence(self):
        strand = 'UAGUGG'
        expected = []
        self.assertEqual(proteins(strand), expected)

    def test_stops_translation_if_stop_codon_at_end_of_two_codon_sequence(
        strand = 'UGGUAG'
        expected = ['Tryptophan']
        self.assertEqual(proteins(strand), expected)

    def test_stops_translation_if_stop_codon_at_end_of_three_codon_sequence(
        strand = 'AUGUUUUAA'
        expected = ['Methionine', 'Phenylalanine']
        self.assertEqual(proteins(strand), expected)

    def test_stops_translation_if_stop_codon_in_middle_of_three_codon_sequence(
        strand = 'UGGUAGUGG'
        expected = ['Tryptophan']
        self.assertEqual(proteins(strand), expected)

    def test_stops_translation_if_stop_codon_in_middle_of_six_codon_sequence(
        strand = 'UGGUGUUAUUAAUGGUUU'
        expected = ['Tryptophan', 'Cysteine', 'Tyrosine']
        self.assertEqual(proteins(strand), expected)

if __name__ == '__main__':
from re import search
from textwrap import wrap

def translate(codon):
    return {
        'UUU': 'Phenylalanine',
        'UUC': 'Phenylalanine',
        'UUA': 'Leucine',
        'UUG': 'Leucine',
        'UCU': 'Serine',
        'UCC': 'Serine',
        'UCA': 'Serine',
        'UCG': 'Serine',
        'UAU': 'Tyrosine',
        'UAC': 'Tyrosine',
        'UGU': 'Cysteine',
        'UGC': 'Cysteine',
        'AUG': 'Methionine',
        'UGG': 'Tryptophan',
        'UAA': 'STOP', 'UAG': 'STOP', 'UGA': 'STOP'

def proteins(strand: str):
    codons = wrap(strand, 3)

    # compare each codon with translation
    # replace each match with aa
    for i in range(len(codons)):
        codons[i] = translate(codons[i])
        if codons[i] == 'STOP':

    # extract non-codons
    amino_acids = [c for c in codons if not search(r'([UAG]{3}|STOP)', c)]

    return amino_acids

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?