Avatar of ErikSchierboom

ErikSchierboom's solution

to SGF Parsing in the C# Track

Published at Jul 13 2018 · 0 comments
Instructions
Test suite
Solution

Note:

This solution was written on an old version of Exercism. The tests below might not correspond to the solution code, and the exercise may have changed since this code was written.

Parsing a Smart Game Format string.

SGF is a standard format for storing board game files, in particular go.

SGF is a fairly simple format. An SGF file usually contains a single tree of nodes where each node is a property list. The property list contains key value pairs, each key can only occur once but may have multiple values.

An SGF file may look like this:

(;FF[4]C[root]SZ[19];B[aa];W[ab])

This is a tree with three nodes:

  • The top level node has two properties: FF[4] (key = "FF", value = "4") and C[root](key = "C", value = "root"). (FF indicates the version of SGF and C is a comment.)
    • The top level node has a single child which has a single property: B[aa]. (Black plays on the point encoded as "aa", which is the 1-1 point (which is a stupid place to play)).
      • The B[aa] node has a single child which has a single property: W[ab].

As you can imagine an SGF file contains a lot of nodes with a single child, which is why there's a shorthand for it.

SGF can encode variations of play. Go players do a lot of backtracking in their reviews (let's try this, doesn't work, let's try that) and SGF supports variations of play sequences. For example:

(;FF[4](;B[aa];W[ab])(;B[dd];W[ee]))

Here the root node has two variations. The first (which by convention indicates what's actually played) is where black plays on 1-1. Black was sent this file by his teacher who pointed out a more sensible play in the second child of the root node: B[dd] (4-4 point, a very standard opening to take the corner).

A key can have multiple values associated with it. For example:

(;FF[4];AB[aa][ab][ba])

Here AB (add black) is used to add three black stones to the board.

There are a few more complexities to SGF (and parsing in general), which you can mostly ignore. You should assume that the input is encoded in UTF-8, the tests won't contain a charset property, so don't worry about that. Furthermore you may assume that all newlines are unix style (\n, no \r or \r\n will be in the tests) and that no optional whitespace between properties, nodes, etc will be in the tests.

The exercise will have you parse an SGF string and return a tree structure of properties. You do not need to encode knowledge about the data types of properties, just use the rules for the text type everywhere.

Hints

  • To parse the text, you could try to use the Sprache library. You can also find a good tutorial here.

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

SgfParsingTest.cs

using System;
using System.Collections.Generic;
using System.Linq;
using Xunit;

public class SgfParsingTest
{
    // This exercises requires you to do somewhat more complicated parsing.
    // This is the perfect exercise to get to know a parser library, of which
    // there are several available. A simple one is Sprache (https://github.com/sprache/Sprache),
    // a more complex one is Antlr (http://www.antlr.org/). You can also build
    // your own parser of course!

    // A tree consists of two parts:
    // - The node's data: a map of type Dictionary<string, string list> 
    // - A list of children (which are itself trees)

    private static SgfTree TreeWithChildren(IDictionary<string, string[]> data, SgfTree[] children) => new SgfTree(data, children);
    private static SgfTree TreeWithSingleChild(IDictionary<string, string[]> data, SgfTree child) => new SgfTree(data, child);
    private static SgfTree TreeWithNoChildren(IDictionary<string, string[]> data) => new SgfTree(data);

    private static IDictionary<string, string[]> CreateData(string key, params string[] values) => new Dictionary<string, string[]> { [key] = values };
    
    [Fact]
    public void Empty_value()
    {
        const string input = "";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(input));
    }

    [Fact(Skip = "Remove to run test")]
    public void Tree_without_nodes()
    {
        const string input = "()";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(input));
    }

    [Fact(Skip = "Remove to run test")]
    public void Node_without_tree()
    {
        const string input = ";";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(input));
    }

    [Fact(Skip = "Remove to run test")]
    public void Node_without_properties()
    {
        const string input = "(;)";
        var expected = TreeWithNoChildren(new Dictionary<string, string[]>());
        Assert.Equal(expected, SgfParser.ParseTree(input), SgfTreeEqualityComparer.Instance);
    }

    [Fact(Skip = "Remove to run test")]
    public void Single_node_tree()
    {
        const string input = "(;A[B])";
        var expected = TreeWithNoChildren(CreateData("A", "B"));
        Assert.Equal(expected, SgfParser.ParseTree(input), SgfTreeEqualityComparer.Instance);
    }

    [Fact(Skip = "Remove to run test")]
    public void Properties_without_delimiter()
    {
        const string input = "(;a)";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(input));
    }

    [Fact(Skip = "Remove to run test")]
    public void All_lowercase_property()
    {
        const string input = "(;a[b])";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(input));
    }

    [Fact(Skip = "Remove to run test")]
    public void Upper_and_lowercase_property()
    {
        const string input = "(;Aa[b])";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(input));
    }

    [Fact(Skip = "Remove to run test")]
    public void Two_nodes()
    {
        const string input = "(;A[B];B[C])";
        var expected = TreeWithSingleChild(CreateData("A", "B"), TreeWithNoChildren(CreateData("B", "C")));
        Assert.Equal(expected, SgfParser.ParseTree(input), SgfTreeEqualityComparer.Instance);
    }

    [Fact(Skip = "Remove to run test")]
    public void Two_child_trees()
    {
        const string input = "(;A[B](;B[C])(;C[D]))";
        var expected = TreeWithChildren(CreateData("A", "B"), 
                            new[]
                            {
                                TreeWithNoChildren(CreateData("B", "C")),
                                TreeWithNoChildren(CreateData("C", "D"))
                            });
        Assert.Equal(expected, SgfParser.ParseTree(input), SgfTreeEqualityComparer.Instance);
    }

    [Fact(Skip = "Remove to run test")]
    public void Multiple_properties()
    {
        const string input = "(;A[b][c][d])";
        var expected = TreeWithNoChildren(CreateData("A", "b", "c", "d"));
        Assert.Equal(expected, SgfParser.ParseTree(input), SgfTreeEqualityComparer.Instance);
    }

    [Fact(Skip = "Remove to run test")]
    public void Escaped_property()
    {
        const string input = @"(;A[\]b\nc\nd\t\te \n\]])";
        var expected = TreeWithNoChildren(CreateData("A", "]b\nc\nd  e \n]"));
        Assert.Equal(expected, SgfParser.ParseTree(input), SgfTreeEqualityComparer.Instance);
    }

    private class SgfTreeEqualityComparer : IEqualityComparer<SgfTree>
    {
        public static readonly SgfTreeEqualityComparer Instance = new SgfTreeEqualityComparer();

        public bool Equals(SgfTree x, SgfTree y)
        {
            return SgfDataEqualityComparer.Instance.Equals(x.Data, y.Data) && x.Children.SequenceEqual(y.Children, Instance);
        }

        public int GetHashCode(SgfTree obj)
        {
            throw new System.NotImplementedException();
        }
    }

    private class SgfDataEqualityComparer : IEqualityComparer<IDictionary<string, string[]>>
    {
        public static readonly SgfDataEqualityComparer Instance = new SgfDataEqualityComparer();

        public bool Equals(IDictionary<string, string[]> x, IDictionary<string, string[]> y)
        {
            return x.Keys.SequenceEqual(y.Keys) && x.Keys.All(key => x[key].SequenceEqual(y[key]));
        }

        public int GetHashCode(IDictionary<string, string[]> obj)
        {
            throw new System.NotImplementedException();
        }
    }
}
using System;
using System.Collections.Generic;
using System.Linq;
using Sprache;

public class SgfTree
{
    public IDictionary<string, string[]> Data { get; }
    public SgfTree[] Children { get; }

    public SgfTree(IDictionary<string, string[]> data, params SgfTree[] children)
    {
        Data = data;
        Children = children;
    }
}

public static class SgfParser
{
    private static char Unescape(char c) => c == 'n' || c == 'r' || c == 't' ? ' ' : c;
    private static string ToString(IEnumerable<char> chars) => new string(chars.ToArray());

    private static readonly Parser<char> NormalChar = Parse.Char(c => c != '\\' && c != ']', "Normal char");
    private static readonly Parser<char> EscapedChar = Parse.Char('\\').Then(_ => Parse.AnyChar.Token().Select(Unescape));
    private static readonly Parser<string> CValueType = EscapedChar.Or(NormalChar).Many().Select(ToString);
    private static readonly Parser<string> PropValue = CValueType.Contained(Parse.Char('['), Parse.Char(']'));
    private static readonly Parser<string> PropIdent = Parse.Char(char.IsUpper, "Upper case character").Select(c => c.ToString());
    private static readonly Parser<IDictionary<string, string[]>> Property = PropIdent.Then(ident => PropValue.Many().Select(values => PropertyToData(ident, values)));
    private static readonly Parser<IDictionary<string, string[]>> Node = Parse.Char(';').Then(_ => Property.Optional().Select(o => o.GetOrDefault() ?? new Dictionary<string, string[]>()));

    private static Parser<SgfTree> GameTree() => 
        Parse.Char('(').Then(c => 
            Node.AtLeastOnce().Then(nodes => 
                GameTree().Many().Then(trees => Parse.Char(')')
                    .Select(___ => NodesToTree(nodes, trees)))));

    public static SgfTree ParseTree(string input) => GameTree().Parse(input);
    
    private static SgfTree NodesToTree(IEnumerable<IDictionary<string, string[]>> properties, IEnumerable<SgfTree> trees)
    {
        var numberOfProperties = properties.Count();

        if (numberOfProperties == 0)
        {
            throw new InvalidOperationException("Can only create tree from non-empty nodes list");
        }

        if (numberOfProperties == 1)
        {
            return new SgfTree(properties.First(), trees.ToArray());
        }

        return new SgfTree(properties.First(), NodesToTree(properties.Skip(1), trees));
    }

    private static IDictionary<string, string[]> PropertyToData(string key, IEnumerable<string> values) =>
        new Dictionary<string, string[]>
        {
            [key] = values.ToArray()
        };
}

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?