ðŸŽ‰ Exercism Research is now launched. Help Exercism, help science and have some fun at research.exercism.io ðŸŽ‰

# jeffvandyke's solution

## to SGF Parsing in the Python Track

Published at Jan 18 2021 · 0 comments
Instructions
Test suite
Solution

Parsing a Smart Game Format string.

SGF is a standard format for storing board game files, in particular go.

SGF is a fairly simple format. An SGF file usually contains a single tree of nodes where each node is a property list. The property list contains key value pairs, each key can only occur once but may have multiple values.

An SGF file may look like this:

``````(;FF[4]C[root]SZ[19];B[aa];W[ab])
``````

This is a tree with three nodes:

• The top level node has three properties: FF[4] (key = "FF", value = "4"), C[root](key = "C", value = "root") and SZ[19] (key = "SZ", value = "19"). (FF indicates the version of SGF, C is a comment and SZ is the size of the board.)
• The top level node has a single child which has a single property: B[aa]. (Black plays on the point encoded as "aa", which is the 1-1 point (which is a stupid place to play)).
• The B[aa] node has a single child which has a single property: W[ab].

As you can imagine an SGF file contains a lot of nodes with a single child, which is why there's a shorthand for it.

SGF can encode variations of play. Go players do a lot of backtracking in their reviews (let's try this, doesn't work, let's try that) and SGF supports variations of play sequences. For example:

``````(;FF[4](;B[aa];W[ab])(;B[dd];W[ee]))
``````

Here the root node has two variations. The first (which by convention indicates what's actually played) is where black plays on 1-1. Black was sent this file by his teacher who pointed out a more sensible play in the second child of the root node: `B[dd]` (4-4 point, a very standard opening to take the corner).

A key can have multiple values associated with it. For example:

``````(;FF[4];AB[aa][ab][ba])
``````

Here `AB` (add black) is used to add three black stones to the board.

There are a few more complexities to SGF (and parsing in general), which you can mostly ignore. You should assume that the input is encoded in UTF-8, the tests won't contain a charset property, so don't worry about that. Furthermore you may assume that all newlines are unix style (`\n`, no `\r` or `\r\n` will be in the tests) and that no optional whitespace between properties, nodes, etc will be in the tests.

The exercise will have you parse an SGF string and return a tree structure of properties. You do not need to encode knowledge about the data types of properties, just use the rules for the text type everywhere.

## Exception messages

Sometimes it is necessary to raise an exception. When you do this, you should include a meaningful error message to indicate what the source of the error is. This makes your code more readable and helps significantly with debugging. Not every exercise will require you to raise an exception, but for those that do, the tests will only pass if you include a message.

To raise a message with an exception, just write it as an argument to the exception type. For example, instead of `raise Exception`, you should write:

``````raise Exception("Meaningful message indicating the source of the error")
``````

## Running the tests

To run the tests, run `pytest sgf_parsing_test.py`

Alternatively, you can tell Python to run the pytest module: `python -m pytest sgf_parsing_test.py`

### Common `pytest` options

• `-v` : enable verbose output
• `-x` : stop running tests on first failure
• `--ff` : run failures from previous test before running other test cases

For other options, see `python -m pytest -h`

## Submitting Exercises

Note that, when trying to submit an exercise, make sure the solution is in the `\$EXERCISM_WORKSPACE/python/sgf-parsing` directory.

You can find your Exercism workspace by running `exercism debug` and looking for the line that starts with `Workspace`.

For more detailed information about running tests, code style and linting, please see Running the Tests.

## Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

### sgf_parsing_test.py

``````import unittest

from sgf_parsing import parse, SgfTree

# Tests adapted from `problem-specifications//canonical-data.json`

class SgfParsingTest(unittest.TestCase):
def test_empty_input(self):
input_string = ""
with self.assertRaisesWithMessage(ValueError):
parse(input_string)

def test_tree_with_no_nodes(self):
input_string = "()"
with self.assertRaisesWithMessage(ValueError):
parse(input_string)

def test_node_without_tree(self):
input_string = ";"
with self.assertRaisesWithMessage(ValueError):
parse(input_string)

def test_node_without_properties(self):
input_string = "(;)"
expected = SgfTree()
self.assertEqual(parse(input_string), expected)

def test_single_node_tree(self):
input_string = "(;A[B])"
expected = SgfTree(properties={"A": ["B"]})
self.assertEqual(parse(input_string), expected)

def test_multiple_properties(self):
input_string = "(;A[b]C[d])"
expected = SgfTree(properties={"A": ["b"], "C": ["d"]})
self.assertEqual(parse(input_string), expected)

def test_properties_without_delimiter(self):
input_string = "(;A)"
with self.assertRaisesWithMessage(ValueError):
parse(input_string)

def test_all_lowercase_property(self):
input_string = "(;a[b])"
with self.assertRaisesWithMessage(ValueError):
parse(input_string)

def test_upper_and_lowercase_property(self):
input_string = "(;Aa[b])"
with self.assertRaisesWithMessage(ValueError):
parse(input_string)

def test_two_nodes(self):
input_string = "(;A[B];B[C])"
expected = SgfTree(properties={"A": ["B"]}, children=[SgfTree({"B": ["C"]})])
self.assertEqual(parse(input_string), expected)

def test_two_child_trees(self):
input_string = "(;A[B](;B[C])(;C[D]))"
expected = SgfTree(
properties={"A": ["B"]},
children=[SgfTree({"B": ["C"]}), SgfTree({"C": ["D"]})],
)
self.assertEqual(parse(input_string), expected)

def test_multiple_property_values(self):
input_string = "(;A[b][c][d])"
expected = SgfTree(properties={"A": ["b", "c", "d"]})
self.assertEqual(parse(input_string), expected)

def test_escaped_property(self):
input_string = "(;A[\\]b\nc\nd\t\te \n\\]])"
expected = SgfTree(properties={"A": ["]b\nc\nd  e \n]"]})
self.assertEqual(parse(input_string), expected)

# Utility functions
def assertRaisesWithMessage(self, exception):
return self.assertRaisesRegex(exception, r".+")

if __name__ == "__main__":
unittest.main()``````
``````"""
Parses SGF using an iterator over the input (with no peeking), using a
recursive node parsing process where each parsing function is able to find an
expected sentinal value that confirms it's done.

Only a slight amount of duplication was needed to achieve this.

No external libs, no regexes.
"""

class SgfTree:
def __init__(self, properties=None, children=None):
self.properties = properties or {}
self.children = children or []

def __eq__(self, other):
if not isinstance(other, SgfTree):
return False
for k, v in self.properties.items():
if k not in other.properties:
return False
if other.properties[k] != v:
return False
for k in other.properties.keys():
if k not in self.properties:
return False
if len(self.children) != len(other.children):
return False
for a, b in zip(self.children, other.children):
if a != b:
return False
return True

def __ne__(self, other):
return not self == other

def gather_key_remainder(input_iter):
"""Gathers remainder of key from input_iter, stopping when a '[' is
reached.  """
result = ""
for test in input_iter:
if test == '[':
return result
if test.isupper():
result += test
else:
raise ValueError(
f"Unexpected char \"{test}\" found while parsing prop key")
raise ValueError("Unexpected end of string found while parsing prop key")

def gather_value(input_iter):
"""From input_iter having just been a '[' char, parses the remainder of the
value, returning when a (unescaped) ']' is found."""
value = ""
# Continue parsing value
test = next(input_iter)
while test != ']':
if test == '\\':  # Handle escape
test = next(input_iter)
if test.isspace() and test != "\n":
test = ' '
value += test
test = next(input_iter)

return value

def parse_node_end_paren(input_iter) -> SgfTree:
"""
Returns a complete node with children (consuming input_iter), when a
matching parentheses from a parent call was found.
"""
properties = dict()
children = []
last_prop_key = None

for test in input_iter:
if test == '(':
if next(input_iter) != ';':
raise ValueError("Open paren must be followed by semicolon")
children.append(parse_node_end_paren(input_iter))
# Continue after paren was found - We've found a '(' to match it
elif test == ')':
break  # Found our paren, we're done here
elif test == ';':
children.append(parse_node_end_paren(input_iter))
break  # A closing paren was found in child, we're done here

elif test.isupper():
last_prop_key = test + gather_key_remainder(input_iter)
properties[last_prop_key] = []
# Now, we should have had a '[' after gather_key_remainder, so
# gather values
properties[last_prop_key].append(gather_value(input_iter))
elif test == '[':
# Got another value
properties[last_prop_key].append(gather_value(input_iter))
else:
raise ValueError(f"Unexpected '{test}' char while parsing node")

return SgfTree(properties if len(properties) > 0 else None,
children if len(children) > 0 else None)

def parse(input_string) -> SgfTree:
"""Parses a valid sgf string"""
if input_string == "":
raise ValueError("Cannot parse empty string")
input_iter = iter(input_string)
if next(input_iter) != '(':
raise ValueError("Did not open with paren")
if next(input_iter) != ';':
raise ValueError('2nd was not a semicolon')

return parse_node_end_paren(input_iter)``````