Avatar of cmccandless

cmccandless's solution

to SGF Parsing in the C# Track

Published at Jul 13 2018 · 0 comments
Instructions
Test suite
Solution

Parsing a Smart Game Format string.

SGF is a standard format for storing board game files, in particular go.

SGF is a fairly simple format. An SGF file usually contains a single tree of nodes where each node is a property list. The property list contains key value pairs, each key can only occur once but may have multiple values.

An SGF file may look like this:

(;FF[4]C[root]SZ[19];B[aa];W[ab])

This is a tree with three nodes:

  • The top level node has three properties: FF[4] (key = "FF", value = "4"), C[root](key = "C", value = "root") and SZ[19] (key = "SZ", value = "19"). (FF indicates the version of SGF, C is a comment and SZ is the size of the board.)
    • The top level node has a single child which has a single property: B[aa]. (Black plays on the point encoded as "aa", which is the 1-1 point (which is a stupid place to play)).
      • The B[aa] node has a single child which has a single property: W[ab].

As you can imagine an SGF file contains a lot of nodes with a single child, which is why there's a shorthand for it.

SGF can encode variations of play. Go players do a lot of backtracking in their reviews (let's try this, doesn't work, let's try that) and SGF supports variations of play sequences. For example:

(;FF[4](;B[aa];W[ab])(;B[dd];W[ee]))

Here the root node has two variations. The first (which by convention indicates what's actually played) is where black plays on 1-1. Black was sent this file by his teacher who pointed out a more sensible play in the second child of the root node: B[dd] (4-4 point, a very standard opening to take the corner).

A key can have multiple values associated with it. For example:

(;FF[4];AB[aa][ab][ba])

Here AB (add black) is used to add three black stones to the board.

There are a few more complexities to SGF (and parsing in general), which you can mostly ignore. You should assume that the input is encoded in UTF-8, the tests won't contain a charset property, so don't worry about that. Furthermore you may assume that all newlines are unix style (\n, no \r or \r\n will be in the tests) and that no optional whitespace between properties, nodes, etc will be in the tests.

The exercise will have you parse an SGF string and return a tree structure of properties. You do not need to encode knowledge about the data types of properties, just use the rules for the text type everywhere.

Hints

  • To parse the text, you could try to use the Sprache library. You can also find a good tutorial here.

Running the tests

To run the tests, run the command dotnet test from within the exercise directory.

Initially, only the first test will be enabled. This is to encourage you to solve the exercise one step at a time. Once you get the first test passing, remove the Skip property from the next test and work on getting that test passing. Once none of the tests are skipped and they are all passing, you can submit your solution using exercism submit SgfParsing.cs

Further information

For more detailed information about the C# track, including how to get help if you're having trouble, please visit the exercism.io C# language page.

SgfParsingTest.cs

// This file was auto-generated based on version 1.2.0 of the canonical data.

using System;
using System.Collections.Generic;
using Xunit;

public class SgfParsingTest
{
    [Fact]
    public void Empty_input()
    {
        var encoded = "";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Tree_with_no_nodes()
    {
        var encoded = "()";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Node_without_tree()
    {
        var encoded = ";";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Node_without_properties()
    {
        var encoded = "(;)";
        var expected = new SgfTree(new Dictionary<string, string[]>());
        Assert.Equal(expected, SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Single_node_tree()
    {
        var encoded = "(;A[B])";
        var expected = new SgfTree(new Dictionary<string, string[]> { ["A"] = new[] { "B" } });
        Assert.Equal(expected, SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Multiple_properties()
    {
        var encoded = "(;A[b]C[d])";
        var expected = new SgfTree(new Dictionary<string, string[]> { ["A"] = new[] { "b" }, ["C"] = new[] { "d" } });
        Assert.Equal(expected, SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Properties_without_delimiter()
    {
        var encoded = "(;A)";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void All_lowercase_property()
    {
        var encoded = "(;a[b])";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Upper_and_lowercase_property()
    {
        var encoded = "(;Aa[b])";
        Assert.Throws<ArgumentException>(() => SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Two_nodes()
    {
        var encoded = "(;A[B];B[C])";
        var expected = new SgfTree(new Dictionary<string, string[]> { ["A"] = new[] { "B" } }, new SgfTree(new Dictionary<string, string[]> { ["B"] = new[] { "C" } }));
        Assert.Equal(expected, SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Two_child_trees()
    {
        var encoded = "(;A[B](;B[C])(;C[D]))";
        var expected = new SgfTree(new Dictionary<string, string[]> { ["A"] = new[] { "B" } }, new SgfTree(new Dictionary<string, string[]> { ["B"] = new[] { "C" } }), new SgfTree(new Dictionary<string, string[]> { ["C"] = new[] { "D" } }));
        Assert.Equal(expected, SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Multiple_property_values()
    {
        var encoded = "(;A[b][c][d])";
        var expected = new SgfTree(new Dictionary<string, string[]> { ["A"] = new[] { "b", "c", "d" } });
        Assert.Equal(expected, SgfParser.ParseTree(encoded));
    }

    [Fact(Skip = "Remove to run test")]
    public void Escaped_property()
    {
        var encoded = "(;A[\\]b\\nc\\nd\\t\\te \\n\\]])";
        var expected = new SgfTree(new Dictionary<string, string[]> { ["A"] = new[] { "]b\nc\nd  e \n]" } });
        Assert.Equal(expected, SgfParser.ParseTree(encoded));
    }
}
using System;
using System.Collections.Generic;
using System.Linq;

public class SgfTree : IEquatable<SgfTree>
{
    public SgfTree(IDictionary<string, string[]> data, params SgfTree[] children)
    {
        Data = data;
		Children = children;
    }

    private IDictionary<string, string[]> Data { get; }
    private SgfTree[] Children { get; set; }

	public void AddChild(SgfTree child) => Children = Children.Append(child).ToArray();

	public void AddData((string key, string[] values) keyValuePair) => AddData(keyValuePair.key, keyValuePair.values);
	public void AddData(string key, string[] values) =>
		Data.Add(
			key.AssertThat(k => k != string.Empty, "Property key cannot be empty"),
			values.AssertThat(vs => vs.Length != 0, "Property must have at least one value")
		);

    public bool Equals(SgfTree other) => $"{this}" == $"{other}";

	private static string FormatData(IDictionary<string, string[]> data) => string.Join(
		"",
		from key in data.Keys
		let values = data[key].ToString(delimiter: "][")
		select $"{key}[{values}]"
	);

	private static string FormatChildren(SgfTree[] children)
	{
		var ret = children.ToString(delimiter: string.Empty);
		if (children.Length == 1) ret = ret.Substring(1, ret.Length - 2);
		return ret;
	}

	public override string ToString() => $"(;{FormatData(Data)}{FormatChildren(Children)})";
}

public class SgfParser
{
	private const char TREE_START = '(';
	private const char TREE_END = ')';
	private const char NODE_START = ';';
	private const char PROPERTY_VALUE_START = '[';
	private const char PROPERTY_VALUE_END = ']';

	private static string ParsePropertyKey(Stack<char> stack) =>
		new string(stack.PopWhile(Char.IsUpper).ToArray());

	private static string ParsePropertyValue(Stack<char> stack)
	{
		stack.Pop(); // Ignore PROPERTY_VALUE_START
		var value = new string(stack.PopWhile(ch => ch != PROPERTY_VALUE_END, s => s.PopEscapable()).ToArray());
		stack.Pop(); // Ignore PROPERTY_VALUE_END
		return value;
	}

	private static IEnumerable<string> ParsePropertyValues(Stack<char> stack) =>
		stack.PopWhile(ch => ch == PROPERTY_VALUE_START, ParsePropertyValue);

	private static (string key, string[] values) ParseProperty(Stack<char> stack) => (
		ParsePropertyKey(stack),
		ParsePropertyValues(stack).ToArray()
	);

	private static SgfTree ParseTree(Stack<char> stack)
	{
		char ch ;
		if (!stack.TryPop(out ch) || ch != TREE_START)
			throw new ArgumentException($"missing '{TREE_START}'");
		if (!stack.TryPop(out ch) || ch != NODE_START)
			throw new ArgumentException($"missing '{NODE_START}'");

		var root = new SgfTree(new Dictionary<string, string[]>());
		while (stack.TryPeek(out ch))
		{
			switch (ch)
			{
				// Single Child
				case NODE_START:
					root.AddChild(ParseTree(stack.Append(TREE_END).Reverse().Append(TREE_START).ToStack()));
					return root;

				// New Subtree
				case TREE_START:
					root.AddChild(ParseTree(stack));
					break;

				// End current tree
				case TREE_END:
					stack.Pop();
					return root;

				// Property
				default:
					root.AddData(ParseProperty(stack));
					break;
			}
		}
		throw new ArgumentException("incomplete tree");
	}

    public static SgfTree ParseTree(string input) => ParseTree(input.Reverse().ToStack());
}

static class Extensions
{
	public static IEnumerable<T> PopWhile<T>(this Stack<T> stack, Func<T, bool> predicate) =>
		stack.PopWhile(predicate, s => s.Pop());

	public static IEnumerable<TOut> PopWhile<TIn, TOut>(this Stack<TIn> stack, Func<TIn, bool> predicate, Func<Stack<TIn>, TOut> popFunc)
	{
		TIn outVal;
		while (stack.TryPeek(out outVal) && predicate(outVal))
			yield return popFunc(stack);
	}
	public static char PopEscapable(this Stack<char> stack)
	{
		var escapables = new Dictionary<char, char>{
			['n'] = '\n', // Preserve escaped newline
			['t'] = ' ',  // Replace escaped tab with space
		};
		var ch = stack.Pop();
		return ch == '\\' ?
			escapables.GetValueOrDefault(stack.Peek(), stack.Pop()) :
			ch;
	}

	public static Stack<T> ToStack<T>(this IEnumerable<T> col) => new Stack<T>(col);

	public static string ToString<T>(this IEnumerable<T> col, string delimiter = ",") =>
		string.Join(delimiter, col.Select(c => c.ToString()));

	public static T AssertThat<T>(this T obj, Func<T, bool> predicate, string msg = null)
	{
		if (predicate(obj)) return obj;
		throw new ArgumentException(msg);
	}
}

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?