Deciphering an unknown language is a challenge even for veteran linguists. But in July, MIT computer scientist Regina Barzilay proved that a computer can do the job well and with astonishing speed. She and her colleagues, Benjamin Snyder and Kevin Knight, developed a program that deciphered large chunks of Ugaritic, an ancient Middle Eastern language, in just a few hours.
Barzilay used a statistical approach that compared Ugaritic with Hebrew, a known related language. By assessing structural similarities between the two, her software calculated the probability that a particular Ugaritic word was a cognate—a functional equivalent—of a selected Hebrew word. (The French pain and Spanish pan are an example of a cognate pair; both mean “bread.”) Because Ugaritic had already been decoded by scholars, the MIT team was able to confirm the program’s success.
Barzilay thinks the software could tackle languages that no human has been able to crack, even if it is not obvious which known tongue it most strongly resembles. “This technique allows you to quickly test several candidate languages to see which is closest,” she says. She plans to set it loose on one of the dozen or so undeciphered ancient languages, perhaps beginning with Etruscan, once spoken in what is now northern Italy.