The Urbie robot uses a binocular stereo camera pair.

Image courtesy of NASA

In January 2002, I was asked to give an opening talk and performance for NAMM, the annual trade show for makers and sellers of musical instruments. What I did was create a rhythmic beat by making the most extreme funny faces I could in quick succession. A computer was watching my face through a digital camera and generating varied opprobrious percussive sounds according to which funny face it recognized. Keeping a rhythm with your face is a new, strange trick. It tickles while you do it. We should expect a generation of kids to adopt the practice en masse any year now.

This is the sort of deceptively silly event that should be taken seriously as an indicator of technological change. My sense is that by the end of this decade, pattern-recognition tasks like facial tracking will become commonplace. On one level, this means we have to rethink policy related to privacy, since hypothetically a network of security cameras could automatically determine where everyone is and what faces they are making, but there are many other extraordinary possibilities. Imagine that your avatar in Second Life (or better yet, in fully realized, immersive virtual reality) was conveying the subtleties of your facial expressions at every moment. Wouldn’t that lead to a splendid new outpouring of creative interpersonal energy?

A deeper meaning for me is that science is gaining an ability to use formal descriptions of ideas like metaphor and similarity that were previously reserved for artists and poets (see Jaron’s World: Computer Evolution for possible implications regarding the future of scientific simulations). Having explicit, rigorous ways to describe the kinds of processes that go on inside brains will bring us closer to a scientific understanding of ourselves. Indeed, pattern-recognition technology and neuroscience are growing up together.




The software I used at NAMM was a perfect example of this intertwining. It was developed by a little company called Eyematic, where I served as chief scientist. The original project had begun under the auspices of Christoph von der Malsburg, a University of Southern California neuroscientist, and his students, especially postdoc Hartmut Neven. Christoph might be best known for his influential theory from the early 1980s that synchronous firing (when multiple neurons go off at the same moment) is important to the way that neural networks function.

In this case, Christoph was trying to develop hypotheses about what functions are performed by particular patches of tissue in the visual cortex—the part of the brain that receives input from the eyes. There aren’t yet any instruments that can measure what a large, complicated neural net is doing in detail, especially while it is part of a living brain, so scientists have to find indirect ways of testing their ideas about what’s going on in there. For instance, if a hypothesis about what a part of the brain is doing turns out to inspire a working technology, that certainly gives the hypothesis a boost.

These days, neuroscience can inspire practical technology rather quickly. Although Eyematic folded, Hartmut Neven and many of the original students started a successor company to salvage the software, and that company was swallowed up by Google last year. What Google plans to do with the stuff isn’t clear yet. I hope they’ll come up with some creative applications along with the expected searching of images on the Net.

Will the age of pattern recognition inspire more privacy anxieties or creative festivities? One determining factor might be whether enough people can develop practical intuitions about how these algorithms work. Informed users will take charge, while ignorant ones will be bamboozled. I’d like to do my little bit to help, so here is an attempt at a commonsense explanation of how image-recognition algorithms work.

I’ll start with a childhood memory. When I was a boy growing up in the desert of southern New Mexico, I encountered a simple example of pattern recognition in the dirt roads. The roads had wavy “corduroy” bumps created by the tires of previous cars; the spacing of the bumps was determined by the average speed of the drivers on the road. When your speed matched that average, the ride would feel less bumpy. You couldn’t see the bumps with your eyes except right at sunset, when the horizontal red light rays highlighted every irregularity in the ground. At midday you had to drive to perceive the hidden information in the road.

Digital algorithms must approach pattern recognition in a similarly indirect way, and they often have to make use of a common procedure that’s a little like running virtual tires over virtual bumps. It’s called the Fourier transform. A Fourier transform detects how much action there is at particular “speeds” (frequencies) in a block of digital information. The graphic equalizer display on many audio players, which shows the intensity of the music in different frequency bands, is a familiar example.