Crosswords are one of humanity’s most popular pastimes. They consist of a set of clues to answers which form a grid of overlapping words. The goal is to complete the grid.
Crosswords are one area of endeavor where humans significantly outperform machines. Indeed, the most prestigious competition is the American Crossword Puzzle Tournament, which is held every year and has always been won by a human.
In 2017, a machine called Dr Fill achieved a respectable 11th place out of over six hundred contestants. But a machine had never won the American Crossword Puzzle Tournament. Until last year.
Enter the Berkeley Crossword Solver, an automatic crossword-solving program developed by Eric Wallace and other members of the natural language programming team at the University California, Berkeley, along with Matt Ginsberg, the original developer of the Dr Fill program.
First Place
The new program outscored all human competitors in the 2021 American Crossword Puzzle Tournament. “[This] marks the first time that a computer program has surpassed human performance at this event,” say the researchers. “Our system won first place against 1,100 top human solvers.”
Machines find crosswords difficult because they consist of several distinct types of clue that require different approaches to solve. For example, knowledge-based clues require an understanding of history or pop culture or other forms of trivia. The team give this as an example: CLUE: Book after Song of Solomon A: ISAIAH.
A wordplay clue requires players to reason about anagrams, puns or words with similar meanings. For example, CLUE: One followed by nothing, A: TEN.
And common-sense clues require a realistic understanding of the real world. For example, CLUE: Cause of a smudge, A: WETINK.
Answers comprising of more than one word are particularly tricky for programs because the number of possible answers increases dramatically.
By contrast, word definitions are often straightforward for a machine to find automatically by data mining. For example, CLUE: Tusked savanna dweller A: WARTHOG.
The Berkeley Crossword Solver’s approach is straightforward. It begins with a neural questioning answering algorithm that produces candidate solutions for all the clues along with a score suggesting how good a solution it might be.
This algorithm is trained on a vast database of over six million question-answer pairs from historical crosswords dating back 70 years. It also uses an open-source natural language AI called GPT-2 to help with word segmentation.
Next, it uses these suggested answers to populate the grid. Without a perfect set of answers, this process is tricky because of numerous conflicts that occur, requiring a preference for one candidate solution over another.
The Berkeley team use a process called Belief Propagation to resolve this. This aims to produce a solution with the highest overlap to the expected solution, rather than a solution preferring words with greater scores from the question answering process.
The result is a solution that is often close to correct but with various small mistakes. So the final stage is a second pass at the puzzle where the program examines alternative solutions that are a small edit away. The program scores the resulting solutions and repeats until no more improvements are possible.
Capable Solver
The outcome is a highly capable crossword solving program. “Our system outperforms even the best human solvers and can solve puzzles from a wide range of domains with perfect accuracy,” say the team.
However, they are at pains to point out that crossword puzzles are not yet “solved”. Instead, The Berkeley Crossword Solver is optimized for certain kinds of puzzles popular in the US, such as the famous New York Times crossword. “Compared to existing approaches, our system improves exact puzzle accuracy from 57% to 82% on crosswords from The New York Times,” say the team.
But it does not perform well on other types. In particular, the program cannot tackle cryptic crosswords of the style that is popular in the UK. “Cryptic crosswords involve a different set of conventions and challenges, e.g., more metalinguistic reasoning clues such as anagrams, and likely require different methods from those we propose,” admit the Berkeley team.
That’s interesting work showing significant advances over previous solvers. It also demonstrates that humans still rule the roost in the world of crossword solving; for how much longer, it is hard to tell.
Ref: Automated Crossword Solving : arxiv.org/abs/2205.09665