🎉 Exercism Research is now launched. Help Exercism, help science and have some fun at research.exercism.io 🎉
Avatar of dgeiger

dgeiger's solution

to Word Count in the Delphi Pascal Track

Published at Sep 04 2020 · 0 comments
Instructions
Test suite
Solution

Given a phrase, count the occurrences of each word in that phrase.

For the purposes of this exercise you can expect that a word will always be one of:

  1. A number composed of one or more ASCII digits (ie "0" or "1234") OR
  2. A simple word composed of one or more ASCII letters (ie "a" or "they") OR
  3. A contraction of two simple words joined by a single apostrophe (ie "it's" or "they're")

When counting words you can assume the following rules:

  1. The count is case insensitive (ie "You", "you", and "YOU" are 3 uses of the same word)
  2. The count is unordered; the tests will ignore how words and counts are ordered
  3. Other than the apostrophe in a contraction all forms of punctuation are ignored
  4. The words can be separated by any form of whitespace (ie "\t", "\n", " ")

For example, for the phrase "That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled. the count would be:

that's: 1
the: 2
password: 2
123: 1
cried: 1
special: 1
agent: 1
so: 1
i: 1
fled: 1

Testing

In order to run the tests for this track, you will need to install DUnitX. Please see the installation instructions for more information.

Loading Exercises into Delphi

If Delphi is properly installed, and *.dpr file types have been associated with Delphi, then double clicking the supplied *.dpr file will start Delphi and load the exercise/project. control + F9 is the keyboard shortcut to compile the project or pressing F9 will compile and run the project.

Alternatively you may opt to start Delphi and load your project via. the File drop down menu.

When Questions Come Up

We monitor the Pascal-Delphi support room on gitter.im to help you with any questions that might arise.

Submitting Exercises

Note that, when trying to submit an exercise, make sure the exercise file you're submitting is in the exercism/delphi/<exerciseName> directory.

For example, if you're submitting ubob.pas for the Bob exercise, the submit command would be something like exercism submit <path_to_exercism_dir>/delphi/bob/ubob.pas.

Source

This is a classic toy problem, but we were reminded of it by seeing it in the Go Tour.

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you may request help from a mentor.

uWordCountTests.pas

unit uWordCountTests;

interface
uses
  System.Generics.Collections, DUnitX.TestFramework;

const
  CanonicalVersion = '1.3.0';

type

  [TestFixture]
  WordCountTests = class(TObject)
  private
    Expected,
    Actual: TDictionary<String, integer>;
    procedure CompareDictionaries(Expected, Actual: TDictionary<String, integer>);
  public
    [Setup]
    procedure Setup;

    [TearDown]
    procedure TearDown;

    [Test]
    procedure Validate_CompareDictionaries;

    [Test]
//    [Ignore('Comment the "[Ignore]" statement to run the test')]
    procedure Count_one_word;

    [Test]
    [Ignore]
    procedure Count_one_of_each_word;

    [Test]
    [Ignore]
    procedure Multiple_occurrences_of_a_word;

    [Test]
    [Ignore]
    procedure Handles_cramped_lists;

    [Test]
    [Ignore]
    procedure Handles_expanded_lists;

    [Test]
    [Ignore]
    procedure Ignore_punctuation;

    [Test]
    [Ignore]
    procedure Include_numbers;

    [Test]
    [Ignore]
    procedure Normalize_case;

    [Test]
    [Ignore]
    procedure With_apostrophes;

    [Test]
    [Ignore]
    procedure With_quotations;

    [Test]
    [Ignore]
    procedure Multiple_spaces_not_detected_as_a_word;

    [Test]
    [Ignore]
    procedure Alternating_word_separators_not_detected_as_a_word;
  end;

implementation

uses SysUtils, uWordCount;


procedure WordCountTests.CompareDictionaries(Expected, Actual: TDictionary<String, Integer>);
var
  expectedPair: TPair<string, Integer>;
begin
  Assert.AreEqual(Expected.Count, Actual.Count,
    '{Word counts should be equal}');
  for expectedPair in Expected do
  begin
    Assert.IsTrue(Actual.ContainsKey(expectedPair.Key),
      format('Actual doesn''t contain Expected "%s"',[expectedPair.Key]));
    Assert.AreEqual(expectedPair.Value, Actual[expectedPair.Key],
      format('{Expected %s: %d; Actual %s: %d}',
        [expectedPair.Key,
         expectedPair.Value,
         expectedPair.Key,
         Actual[expectedPair.Key]]));
  end;
end;

procedure WordCountTests.Validate_CompareDictionaries;
begin
  Expected.Add('r',5);
  Expected.Add('a',10);
  Expected.Add('n',15);
  Expected.Add('d',20);
  Expected.Add('o',25);
  Expected.Add('m',30);

  actual := TDictionary<String, Integer>.create(expected);

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Alternating_word_separators_not_detected_as_a_word;
begin
  Expected.Add('one',1);
  Expected.Add('two',1);
  Expected.Add('three',1);

  Actual := WordCount(',\n,one,\n ,two \n ''three''').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Count_one_word;
begin
  Expected.Add('word',1);

  Actual := WordCount('word').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Count_one_of_each_word;
begin
  Expected.Add('one',1);
  Expected.Add('of',1);
  Expected.Add('each',1);

  Actual :=  WordCount('one of each').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Multiple_occurrences_of_a_word;
begin
  Expected.Add('one',1);
  Expected.Add('fish',4);
  Expected.Add('two',1);
  Expected.Add('red',1);
  Expected.Add('blue',1);

  Actual := WordCount('one fish two fish red fish blue fish').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Handles_cramped_lists;
begin
  Expected.Add('one',1);
  Expected.Add('two',1);
  Expected.Add('three',1);

  Actual := WordCount('one,two,three').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Handles_expanded_lists;
begin
  Expected.Add('one',1);
  Expected.Add('two',1);
  Expected.Add('three',1);

  Actual := WordCount('one,\ntwo,\nthree').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Ignore_punctuation;
begin
  Expected.Add('car',1);
  Expected.Add('carpet',1);
  Expected.Add('as',1);
  Expected.Add('java',1);
  Expected.Add('javascript',1);

  Actual := WordCount('car: carpet as java: javascript!!&@$%^&').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Include_numbers;
begin
  Expected.Add('testing',2);
  Expected.Add('1',1);
  Expected.Add('2',1);

  Actual := WordCount('testing, 1, 2 testing').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Normalize_case;
begin
  Expected.Add('go',3);
  Expected.Add('stop',2);

  Actual := WordCount('go Go GO Stop stop').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Setup;
begin
  Expected := TDictionary<String, integer>.Create;
end;

procedure WordCountTests.TearDown;
begin
  Expected.DisposeOf;
  Actual.DisposeOf;
end;

procedure WordCountTests.With_apostrophes;
begin
  Expected.Add('first',1);
  Expected.Add('don''t',2);
  Expected.Add('laugh',1);
  Expected.Add('then',1);
  Expected.Add('cry',1);

  Actual := WordCount('First: don''t laugh. Then: don''t cry.').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.With_quotations;
begin
  Expected.Add('joe',1);
  Expected.Add('can''t',1);
  Expected.Add('tell',1);
  Expected.Add('between',1);
  Expected.Add('large',2);
  Expected.Add('and',1);

  Actual := WordCount('Joe can''t tell between ''large'' and large').countWords;

  CompareDictionaries(Expected, Actual);
end;

procedure WordCountTests.Multiple_spaces_not_detected_as_a_word;
begin
  Expected.Add('multiple',1);
  Expected.Add('whitespaces',1);

  Actual := WordCount(' multiple   whitespaces').countWords;

  CompareDictionaries(Expected, Actual);
end;

initialization
  TDUnitX.RegisterTestFixture(WordCountTests);
end.
unit uWordCount;

interface

uses
  System.Classes, System.Generics.Collections, System.SysUtils;

  type
    TWordCount = class
      private
        FWords: TDictionary<String, integer>;
        FPhrase: String;
        FDelimiters: TList<String>;
        FValidCharacters: TSysCharSet;

        function StripLeadingAndTrailingApostrophes(Word: String): String;

      public
        constructor Create(Phrase: string);

        function countWords: TDictionary<String, integer>;

        procedure ScanString;

    end;

  function WordCount(const Phrase: String): TWordCount;

implementation

{ WordCount }

procedure TWordCount.ScanString;
var
  WorkString: String;
  Word: String;
  Index: Integer;
  WhiteSpace: Boolean;
  Character: Char;
  NextCharacter: Char;
  PhraseLength: Integer;
  Count: Integer;
begin
  // Create the list of rows found
  FWords := TDictionary<String, integer>.Create;

  // Make a temporary working copy of the phrase, forcing it to lower case
  WorkString := LowerCase(FPhrase);

  // Get the length of WorkString, so we can use it to control our loop
  PhraseLength := Length(WorkString);

  // Start the loop counter at zero
  Index := 0;

  // We look for words until the string is empty
  while Index <= PhraseLength do
    begin
      // We'll assume that the character isn't whitespace
      WhiteSpace := False;

      // Increment the loop counter.
      Inc(Index);

      // then get the character at that position.
      Character := WorkString[Index];

      // Have we finished scanning the string yet?
      if Index > PhraseLength then
        // Yes, so e're going to treat the end of the string as a delimiter
        WhiteSpace := True
      else
        // No, we're still in the string. Now, we see if this is a delimiter
        if FDelimiters.IndexOf(Character) <> -1 then
          // It is, so flag that we've found whitespace
          WhiteSpace := True
        else
          // It wasn't a single-character delimiter, so we need to see if
          // it's the start of a two-character delimiter
          if Character = '\' then
            // It's potentially a two-character delimiter. If it's at the
            // last position of the string, it can't be one, however.
            if Index < PhraseLength - 1 then
              begin
                // It's somewhere before the last position of the string, so
                // we need to check the next character to see if it's a
                // two-character delimiter.
                NextCharacter := WorkString[Index + 1];

                // If the next character is 'n' or 't', it's part of a
                // two-character delimiter and we need to skip it
                if (NextCharacter = 'n') or (NextCharacter = 't') then
                  begin
                    // It is a two-character delimiter, so flag it
                    WhiteSpace := True;

                    // and skip the second character.
                    Inc(Index);
                  end;
              end;

      // Have we found the end of the word yet?
      if WhiteSpace then
        // Yes, so we need to add valid words to the dictionary of words
        begin
          // Is the word empty?
          if Word <> '' then
            begin
              // No, we can add it. But first, we need to make sure any
              // apostrophes - which can be valid - are only in the word,
              // not before or after it.
              Word := StripLeadingAndTrailingApostrophes(Word);

              // Is the word in the dictionary already?
              if FWords.ContainsKey(Word) then
                // Yes, so get the count for the word, and increment it
                Count := FWords.Items[Word] + 1
              else
                // No, so we start the count at one
                Count := 1;

              // and add the new word and value, or set the new value
              // for an existing word.
              FWords.AddOrSetValue(Word, Count);
            end;

          // Reset the word for the next iteration
          Word := '';
        end
      else
        // We only add valid character to words
        if CharInSet(Character, FValidCharacters) then
          // It's valid, so add it
          Word := Word + Character;
    end;
end;

function TWordCount.StripLeadingAndTrailingApostrophes(Word: String): String;
var
  Position: Integer;
  Character: Char;
begin
  // Start with the passed word
  Result := Word;

  // and the first character of that word
  Position := 1;

  // We start at the beginning of the word looking for leading of apostrophes
  while Position < Length(Result) do
    begin
      // Get the current character
      Character := Result[Position];

      // Is the current character an apostrophe?
      if (Character = '''') then
        // Yes, delete the apostrophe
        Delete(Result, Position, 1)
      else
        // While we're at it, we only allow valid characters. If this is a
        // valid character, we stop looking for leading apostrophes.
        if CharInSet(Character, FValidCharacters) then
          break;

      // Time to move to the next position
      Inc(Position);
    end;

  // Now, we start with the last character of the word
  Position := Length(Result);

  // We'll keep looking until we either find a valid character that isn't
  // an apostrophe or reach the start of the string
  while Position > 0 do
    begin
      // Get the current character
      Character := Result[Position];

      // Is the current character an apostrophe?
      if (Character = '''') then
        // Yes, delete the apostrophe
        Delete(Result, Position, 1)
      else
        // While we're at it, we only allow valid characters. If this is a
        // valid character, we stop looking for leading apostrophes.
        if CharInSet(Character, FValidCharacters) then
          break;

      // Time to move to the next position
      Dec(Position);
    end;
end;

function TWordCount.countWords: TDictionary<String, integer>;
begin
  // Return the words dictionary
  Result := FWords;
end;


function WordCount(const Phrase: String): TWordCount;
begin
  // Instantiate the object using the passed phrase
  Result := TWordCount.Create(Phrase);
end;

constructor TWordCount.Create(Phrase: string);
begin
  // Save the phrase in our object
  FPhrase := Phrase;

  // Create the delimiter array
  FDelimiters := TList<String>.Create;

  // and populate it.
  FDelimiters.AddRange([' ', ',', '!', '"', ';', ':', '?']);

  // Set the valid characters to be in a word
  FValidCharacters := ['a'..'z', '0'..'9', ''''];

  // Scan the string and build the dictionary
  ScanString;
end;

end.

Community comments

Find this solution interesting? Ask the author a question to learn more.

What can you learn from this solution?

A huge amount can be learned from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that you could read more about to improve your understanding?