Unfortunately, the Fourier transform isn’t powerful enough to recognize a face, but it has a more sophisticated older brother, the Gabor wavelet transform, that can get us halfway there. This mathematical process identifies individual blips of action at particular frequencies in particular places, while the Fourier transform just tells you what frequencies are present overall.
In order to explain wavelets, I’m going to take us back to cars driving across bumpy surfaces—but this time I’m also going to invoke some preposterous imaginary aliens. Suppose for a moment that Earth was visited by huge, jovial aliens who enjoyed imprinting patterns on the deserts by pressing city-size coins down against the dirt. The aliens’ coins mimic ours: They have flattened sculptures of human faces on each side. Like the old roads I grew up with, the impressions left on the desert floor are invisible except at sunset.
Your job is to ride around on the desert, then find and recognize the faces without waiting until sunset. You can also invite friends to ride along with you. There are a lot of strategies you could use, but I’ll describe one here that works pretty well. First, assign each driver a specific face spot to look for. For instance, imagine you are looking for the left corner of the mouth. This is where the thin ends of the lips meet, and there may be an angular wedge of darkness in between them, depending on how open the mouth is.
You have no idea how the coin was rotated, where it was pressed to the ground, or how big the face on it is. So your best bet is to start riding around at random locations in spiral motions. Why spirals? They can match up equally well with an impression of the juncture of the lips regardless of its size or orientation. This is an important idea: Your driving strategy at the most minute level determines what kinds of results you can get in seeing the big picture.
You are looking for a spot where, as you spiral around, you feel two impressions that have a gulf in between them—corresponding to the corner where the two lips and the opening of the mouth meet. As you spiral out, the impressions and the gulf should get smoothly larger, just like real lips.
Think you’ve found your spot? Now you call your friends and tell them your GPS coordinates. All the drivers have been issued a simple outline of the face spots that make up a generic face, and now they use it as a guide. If a driver thinks she has the edge of a nostril, then other drivers will look in the most likely places for the other end of the mouth, the corners of the eyes, and so on. Philosophers take note: The generic face map is like Plato’s ancient idea of an ideal version of a thing.
It’s possible that the person whose image was impressed into the ground was covering part of his face with a hand, so you might not find all the spots you’re looking for, but even so, if you find a bunch of them, you can feel confident you’ve found a face. But whose face? And what expression is it making?
To answer these questions, you need to refer to details about all the face spots that have been found in previous expeditions. Fortunately, we have an extensive database of previously recognized lists of spots that are known to correspond to individual people, to certain facial expressions, and to other qualities, like age and sex. Your new list of spots won’t exactly match any entry in this database, but it’s easy to find the closest matches. By this point, you’re pretty good at finding faces impressed in the dirt.
Where did that wonderful facial database come from? A lot of hard work—but mostly during the early phases of development. Initially, researchers had to grade the algorithm’s performance on a multitude of trials, retaining only those face patterns that gave correct results. The early stages of database gathering aren’t foolproof. A particular lab that has mostly clean-shaven engineers might initially fail to include examples from bearded guys like me, for instance. Later, once the system is performing reasonably well, it can gather more face patterns automatically. A Darwinian phase eventually begins, in which the algorithm evolves, ridding itself of incorrect face patterns and getting better and better with time.
There are striking parallels between what works in engineering and what is observed in human brains, including the Platonic/Darwinian duality: A newborn infant can track a simple diagrammatic face, but a child needs to see people in order to learn how to recognize individuals.
I’m happy to report that Hartmut’s group earned some top scores in a government-sponsored competition in face recognition. The National Institute of Standards and Technology tests these systems in the same spirit in which drugs and cars are tested; they’re important enough that the public needs to know which ones are trustworthy.
I’m less happy to report that I suffer from mild prosopagnosia, a subnormal ability to recognize faces. Computers are still not quite as good as people in general at recognizing faces, but the algorithms I’ve described here are already better at the task than I am.