The builders of mobile gadgets face a paradox. They want to make the most powerful device they can, squeezed into the smallest box possible. But for a device to be useful, human beings have to be able to interact with all its features. More and more functions mean more and more buttons—and humans have stubbornly remained the same size and shape. A button can be made only so small before it becomes impossible to press, putting a tough limit on miniaturization. Different devices confront this paradox in different ways: Cell phone keypad buttons routinely do double, triple, and even quadruple duty, while devices like tablet computers use touch screens and gesture recognition.
AT&T is developing another solution. It wants you to be able simply to talk to an electronic device and have it follow your instructions. While some cell phones already offer voice recognition for basic tasks, such as looking up phone numbers in a contact list, AT&T envisions devices that can handle much more complicated voice commands, such as “Tell me where I can find the nearest ATM” or “Order me a pepperoni pizza.”
For decades AT&T has been working on a voice recognition system that can handle just such requests. Known as Watson, it is so complex that it is more practical to run the software on centralized servers than to install, manage, and maintain it on countless mobile devices. Fortunately, today’s mobile devices have the ability to connect to the Internet in spades. By including some very basic hardware and software to capture and compress speech (which phones already possess), any device can be given the gift of voice recognition. Captured speech is sent, via the Internet or a cell phone network, to AT&T computers running Watson. The Watson software analyzes the speech and sends back a digital response that the device can translate into commands. To demonstrate the principle, AT&T researchers have built a voice-operated television remote control. Designed to work with AT&T’s Internet TV service, U-verse, the remote lets you do things like ask it to find any comedies that might be on TV now or to search the listings for movies starring Bruce Willis.
AT&T is already working with developers to create prototypes for other real-world applications —a yellow pages application for the iPhone, for instance—and expects to make more announcements about the future of this technology in the next few months.
How it Works
AT&T’s networked voice recognition system is a mash-up. A mash-up is software that uses the Internet to glue different programs with different capabilities together. Here, the goal is to merge a general voice recognition application—Watson—with things like information databases or the specialized software that runs a cable television or digital video recorder. In the example of a TV remote control, the remote captures speech from the user—“I want to see Channel 114”—compresses it, and uses a wireless connection to send it to a server running Watson. Watson not only recognizes individual words but can also be programmed to extract some meaning from simple sentences. It does this using sets of rules that can digest a variety of naturally spoken sentences into standardized text—for example, “What is the time?” means the same thing as “Tell me the time.” The text can then be translated by software running on the device into actual machine commands, such as transmitting to a television the signal to select a particular channel