Back Original

I Wrote Crossword Software! In Haskell!

I’ve been posting the crossword puzzles I’ve been making on this blog. A fellow Recurse Center alum asked if I’d blogged about the software I wrote to help make these crosswords and I thought that was an interesting idea, so here we go!

I’ll first do a high-level overview, and then we’ll dive into some of the code for folks who are interested in the details of how the code actually works.

Crossword Software – Overview

The software itself is in two parts: a Vue.js frontend written in JS (github), and a web API written in Haskell (github). I did this because I wanted to practice writing Haskell, and I specifically wanted to write a web server in Haskell, so the architecture isn’t optimized for anything else.

The frontend uses what I jokingly refer to as a SFSPA – a “Single File Single Page Application”. Pretty much all of the useful code in the frontend is written in one ~1300 line Vue file, all in one Vue component. For projects of a fixed, small size like this, breaking everything out into its own component and having to pass data up and down the rendering tree just ends up slowing down development. If I ever decide to turn this into anything more than it is, I would probably go through the trouble to break it up. But for now it’s really easy to just have one data store, one component, etc. I spent very little time fighting with Vue, and a lot of time actually working on the fun parts of the code, which was great.

The UI is super important, actually – it’s probably more useful than the backend. It allows intuitive (to me) navigation of the crossword, and pressing “enter” brings up a list of words that fit in the currently selected word-space. Then, I can navigate through a list of potential words and see them “ghost” into the puzzle, so that I can check what looks like it will work with the cross-words.

The backend uses a historical dump of NYT crossword answers, sorted by number of appearances in puzzles. I wrote this four years ago so I don’t remember where I got the data.

The Haskell that I wrote is super overengineered. I was learning the language, so I put my effort into figuring out how the heck to write anything, versus writing anything particularly smart. The algorithm is brute force, but it’s “fast enough”. Someday perhaps I’ll be lucky enough spend a few hours pairing with someone who really knows Haskell and I can really make this go, but right now I can use it to make crosswords and that’s good enough for me. Caveats aside, I’m still proud of the code – it’s the first significant application I wrote in Haskell.

The basic approach is that the backend responds to queries that look like “…A”, and then returns all of the words that match that string, where . represents an empty square. The previous string, for example, returns “AREA ARIA IDEA ASIA ASEA ETNA IOTA ELLA AURA ELIA” as the first page of results. You can see that the words get more obscure as you move down the list. (ARIA is up there because it has a lot of vowels, which increases its utility as a cross-word).

It can also attempt to solve an entire puzzle, although it does this via brute force, and therefore takes exponential(?) time. I’ve found that it can solve 5×5 puzzles pretty well, but anything above that often causes a timeout. I don’t use the “solve” functionality very often, although sometimes for a difficult final corner I’ll give it a try.

Some functionality I would LOVE to add to this is the ability to designate just a few word spaces (e.g. “1 down, 4 across, 6 across” with spacing/letters info) and have it generate words that work in those spaces. Maybe someday – I think that’s mostly a matter of writing the API endpoint.

Anyway, this has helped me write quite a few crosswords. Writing about this now, since the solver isn’t particularly useful, I could probably load the NYT data right into the frontend and just query it there in JS and that would be faster, but then I couldn’t claim that I’d written something useful in Haskell 🙂

Code Dive

Just in case it’s interesting to anyone, here’s a deeper dive into the Haskell code that runs the API. Let’s trace through the function calls to the /words endpoint, which is the endpoint that takes in a string like “…A” and returns “AREA ARIA” etc., as I mentioned above.

I’ll assume you are at least familiar with Haskell syntax (which you can learn for free, if you’re not!)

The main function for the entire server is in Server.hs and is:

main :: IO ()
main = do
    -- Debug text
    putStrLn "Loading Words..."
    
     -- Pull data (with empty list of excluded files
    (errors, results) <- loadData "data" []
    
    -- Print the errors (mapM allows IO), and ignore the results (_)
    mapM_ print errors
    
    -- Generate a trie structure from the results.
    -- Each trie node has a "wordFrequency" if that node is a word
    let wts = map wordTrieFromFileResult results
    
    -- Get the port from an env var and start the server
    port <- getPort
    putStrLn $ "Server Started at http://localhost:" ++ show port ++ "/"
    
    -- Importantly, allow Cors, and also wrap this in a timeout func
    -- since the solve endpoint in particular can take a while to run
    run port $ allowCors $ timebound $ wordsApp wts

The words are stored in a Trie generated in wordTrieFromFileResult, which allows for easier recursive access (source).

I’ll skip the code for building the tree (source) but once it’s built, we set up an API endpoint for looking up words:

findWordCompletions :: [WordTrie] -> Query -> Maybe String
findWordCompletions wts q = do
    -- Get the "word" query parameter
    wq <- lookup "word" q
    -- Unpack the Maybe value
    wqv <- wq
    -- Get the "page" query parameter (or 0, if not present)
    let page = case lookup "page" q of
                Just pgv -> fromMaybe 0 $ readMaybe (maybe "0" toString pgv) :: Int
                Nothing -> 0
    -- call getCompletions
    let completions = getCompletions' wts (toString wqv)
    -- Return just the words that we care about. I'm not entirely sure
    -- if this does extra work, or if it just computes the word
    -- completions lazily. Potential future optimization point!
    Just $ unwords $ take 10 $ drop (10 * page) completions

And then getCompletions' is the meat of this process. I can’t tell you why getCompletions' is the main function and getCompletions is the helper function. Usually the apostrophe means a strict version of the function, but it doesn’t look like that’s what I’m doing here. Not sure what I was thinking four years ago!

getCompletions :: [WordTrie] -> [String] -> String -> [String]
-- The completions for an empty string is an empty list
getCompletions _ _ "" = []

-- wts is the WordTrie, used is a list of words we've already used (and therefore should be ignored), and s is the search string.
getCompletions wts used s

    -- If there's a "." in the word, then we need to do a search. 
    -- First we calculate the words using getWords, and then remove any
    -- words from that list with (\\) which are already used.
    | '.' `elem` s = let foundWords = nub $ concatMap (getWords s) wts
        in foundWords \\ used
        
    -- If there's no "." in the word, no search is needed since it's
    -- already a full word!
    | any (isWord s) wts = [s]
    
    -- Handling any other case (which should never happen). Probably
    -- could rewrite this as an if/else, honestly?
    | otherwise = []

getCompletions' :: [WordTrie] -> String -> [String]
getCompletions' wts = getCompletions wts []

getWords, then, returns a list of words that match the search string:

getWords :: String -> WordTrie -> [String]
getWords s wt =
    -- sort the (word, frequency) pairs by the frequency
    let sort = sortBy (flip compare `on` snd)
    -- Return just the words, sorted
    in map fst $ sort $ getWords1 s wt

GetWords1 (why not GetWords'??) is almost the end of this particular deep-dive.

getWords1 ::  String -> WordTrie -> [WordWithFreq]
-- Empty search? All done!
getWords1 [] _ = []

-- If we're looking at a ".", then we need to kick off searches
-- for every letter (in this case, `nodes w`). Looking at this now, I'm
-- pretty sure I could refactor this to just call getLeaves, which *also*
-- fans out when it finds a ".". Probably doing a lot of extra work here!
getWords1 ('.':cs) w =
    let leaves = concatMap (getLeaves cs) (nodes w)
    in produceWords leaves
-- If we have anything but a ".", then we look for just that letter
-- in the tree and start there. Finally, we call produceWords, which
-- will create letters out of the list of nodes that `getLeaves`
-- produces. This feels overengineered! 
getWords1 (c:cs) w = case findLetter (nodes w) c of
    Just n -> produceWords $ getLeaves cs n
    Nothing -> []

Finally, we have getLeaves, which returns a list of letterNodes that match the string. If, for example, we had the search string “AB.” and the words “ABC ABD”, we would return both the “C” and “D” nodes.

getLeaves :: String -> LetterNode -> [LetterNode]
-- I had to ask about this when reading it again four years after I
-- wrote it! The pipe character here creates a list comprehension
-- (https://wiki.haskell.org/List_comprehension). In this case, we
-- have reached the end of the search string, so we check: if this
-- is a word (this node has a wordFrequency) then we return a list
-- containing just l. Otherwise, we return an empty list.
getLeaves [] l = [l | isJust $ wordFrequency l]

-- If we encounter a dot, we fan out and continue with all of the children,
-- since "." matches every character
getLeaves ('.':cs) l =
    concatMap (getLeaves cs) (children l)

-- Finally, if we have any actual character, we see if that character is
-- in the current node's children. If it is, we continue, if not, we have
-- found a non-word and can return an empty list. 
getLeaves (c:cs) l = case findLetter (children l) c of
    Just n -> getLeaves cs n
    Nothing -> []

produceWords takes in a list of letter nodes that match, and walks backward up the trie to actually figure out the strings that those letter nodes represent. As a side note, I’m really not a fan of where notation since it means you have to read the entire function to figure out what variables are defined!

produceWords :: [LetterNode] -> [WordWithFreq]
-- Call readUp on each of the letter nodes in the list
produceWords = map readUp

readUp :: LetterNode -> WordWithFreq
-- We take in a single letterNode
readUp l = case parent l of
        -- If that letternode has a parent, we call readUp_ which gives us
        -- the string associated with this letter node, and then we
        -- reverse that string (since we "read" it backwards). Finally, we
        -- include the word frequency in the return value for sorting.
        Just p -> (reverse (char l : readUp_ p), wf)
        -- When we hit the top, that means we're done!
        Nothing -> ([char l], wf)
        where
            -- Get the word frequency. Throws an error if there isn't a 
            -- word frequency (which should never happen, since we only
            -- call this function when we are trying to read a word, which
            -- by defintion has a frequency).
            wf = fromMaybe (error "Can't readUp nonword") (wordFrequency l)

readUp_ :: LetterNode -> String
-- Helper function that actually recurses up the parent stack,
-- building the String along the way
readUp_ l = case parent l of
    Just p -> char l : readUp_ p
    Nothing -> [char l]

And that’s the main thing that this program does! Reading it again four years later, I think it’s really overengineered. Iterating over a list of words with a regex in javascript would probably be faster, but this was a really fun exploration of Haskell and allowed me to test out some interesting data structures that I don’t normally get to try!

From Grand Rapids,

Erty