0
2
0
3

basbossink's solution

to Word Count in the Haskell Track

Instructions
Test Suite
Solution

Word Count

Given a phrase, count the occurrences of each word in that phrase.

For example for the input "olly olly in come free"

olly: 2
in: 1
come: 1
free: 1

Hints

To complete this exercise you need to implement the function wordCount, that takes a text and returns how many times each word appears.

If it is your first time solving this exercise, it is recommended that you stick to the provided signature:

wordCount :: String -> [(String, Int)]

Later, it may be a good idea to revisit this problem and play with other data types and libraries:

  • Text, from package text.
  • Map, from package containers.
  • MultiSet, from package multiset

The test suite was intentionally designed to accept almost any type signature that makes sense, so you are encouraged to find the one you think is the best.

Getting Started

For installation and learning resources, refer to the exercism help page.

Running the tests

To run the test suite, execute the following command:

stack test

If you get an error message like this...

No .cabal file found in directory

You are probably running an old stack version and need to upgrade it.

Otherwise, if you get an error message like this...

No compiler found, expected minor version match with...
Try running "stack setup" to install the correct GHC...

Just do as it says and it will download and install the correct compiler version:

stack setup

Running GHCi

If you want to play with your solution in GHCi, just run the command:

stack ghci

Feedback, Issues, Pull Requests

The exercism/haskell repository on GitHub is the home for all of the Haskell exercises.

If you have feedback about an exercise, or want to help implementing a new one, head over there and create an issue. We'll do our best to help you!

Source

This is a classic toy problem, but we were reminded of it by seeing it in the Go Tour.

Submitting Incomplete Solutions

It's possible to submit an incomplete solution so you can see how others have completed the exercise.

Tests.hs

{-# OPTIONS_GHC -fno-warn-type-defaults #-}
{-# LANGUAGE RecordWildCards #-}

import Data.Bifunctor    (bimap)
import Data.Char         (toLower)
import Data.Foldable     (for_)
import GHC.Exts          (fromList, toList)
import Test.Hspec        (Spec, describe, it, shouldMatchList)
import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)

import WordCount (wordCount)

main :: IO ()
main = hspecWith defaultConfig {configFastFail = True} specs

specs :: Spec
specs = describe "wordCount" $ for_ cases test
  where
    -- Here we used `fromIntegral`, `fromList` and `toList` to generalize
    -- the tests, accepting any function that receives a string-like argumment
    -- and returns a type that can be converted to [(String, Integer)].
    -- Also, the words are lower-cased before comparison and the output's
    -- order is ignored.
    test Case{..} = it description $ expression `shouldMatchList` expected
      where
        expression = map (bimap (map toLower . toList) fromIntegral)
                   . toList
                   . wordCount
                   . fromList
                   $ input

data Case = Case { description :: String
                 , input       :: String
                 , expected    :: [(String, Integer)]
                 }

cases :: [Case]
cases = [ Case { description = "count one word"
               , input       = "word"
               , expected    = [ ("word", 1) ]
               }
        , Case { description = "count one of each word"
               , input       = "one of each"
               , expected    = [ ("one" , 1)
                               , ("of"  , 1)
                               , ("each", 1) ]
               }
        , Case { description = "multiple occurrences of a word"
               , input       = "one fish two fish red fish blue fish"
               , expected    = [ ("one" , 1)
                               , ("fish", 4)
                               , ("two" , 1)
                               , ("red" , 1)
                               , ("blue", 1) ]
               }
        , Case { description = "handles cramped lists"
               , input       = "one,two,three"
               , expected    = [ ("one"  , 1)
                               , ("two"  , 1)
                               , ("three", 1) ]
               }
        , Case { description = "handles expanded lists"
               , input       = "one,\ntwo,\nthree"
               , expected    = [ ("one"  , 1)
                               , ("two"  , 1)
                               , ("three", 1) ]
               }
        , Case { description = "ignore punctuation"
               , input       = "car : carpet as java : javascript!!&@$%^&"
               , expected    = [ ("car"       , 1)
                               , ("carpet"    , 1)
                               , ("as"        , 1)
                               , ("java"      , 1)
                               , ("javascript", 1) ]
               }
        , Case { description = "include numbers"
               , input       = "testing, 1, 2 testing"
               , expected    = [ ("testing", 2)
                               , ("1"      , 1)
                               , ("2"      , 1) ]
               }
        , Case { description = "normalize case"
               , input       = "go Go GO Stop stop"
               , expected    = [ ("go"  , 3)
                               , ("stop", 2) ]
               }
        , Case { description = "with apostrophes"
               , input       = "First: don't laugh. Then: don't cry."
               , expected    = [ ("first", 1)
                               , ("don't", 2)
                               , ("laugh", 1)
                               , ("then" , 1)
                               , ("cry"  , 1) ]
               }
        , Case { description = "with quotations"
               , input       = "Joe can't tell between 'large' and large."
               , expected    = [ ("joe"    , 1)
                               , ("can't"  , 1)
                               , ("tell"   , 1)
                               , ("between", 1)
                               , ("large"  , 2)
                               , ("and"    , 1) ]
               }
        ]
{-
Copyright 2014 Bas Bossink (bas.bossink@gmail.com).
-}
module WordCount (wordCount) where

import Data.Char (isAlphaNum, toLower)
import Data.Foldable (foldl')
import Data.Map.Strict (Map, empty, insertWith)

wordCount :: String -> Map String Integer
wordCount = tallyWords . words . map normalize

tallyWords :: [String] -> Map String Integer
tallyWords = foldl' tallyWord empty 
  where 
  tallyWord accumulator word = insertWith (+) word 1 accumulator

normalize :: Char -> Char
normalize c 
  | isAlphaNum c = toLower c
  | otherwise = ' '

What can you learn from this solution?

A huge amount can be learnt from reading other people’s code. This is why we wanted to give exercism users the option of making their solutions public.

Here are some questions to help you reflect on this solution and learn the most from it.

  • What compromises have been made?
  • Are there new concepts here that I could read more about to develop my understanding?

Community Comments

See what others have said about this solution
over 4 years ago
houshuang says

Mine is a lot shorter (50%) but arguably not as readable.

over 4 years ago
sjakobi says

I'll take the copying as an implicit compliment :)

I think that tallyWords is a lot clearer now!

over 4 years ago
basbossink says

Decided to follow up with a revision of iteration 7. I believe this solution is easier to read and understand than the use of the fromListWith construction in iteration 8. Normalizing the input first using the noramlize function, shamelessly stolen from sjakobi's word-count also cleans up tallyWords quite a lot. I've tried to simplify tallyWord but can't seem to find anything that is easier to read/understand. Any suggestions?