Last October China’s Tianhe-1A took the title of the world’s most powerful supercomputer, capable of 2.5 petaflops, meaning it can perform 2.5 quadrillion operations per second. It may not hold the top spot for long, as IBM says that its 20- petaflop giant Sequoia will come online next year.
Looking ahead, engineers have set their sights even higher, on computers a thousand times as fast as Tianhe-1A that could model the global climate with unprecedented accuracy, simulate molecular interactions, and track terrorist activity. Such machines would operate in the realm called the exascale, performing a quintillion (that’s a 1 with 18 zeroes after it) calculations per second.
The biggest hurdle to super-supercomputing is energy. Today’s supercomputers consume more than 5 megawatts of power. Exascale computers built on the same principles would devour 100 to 500 megawatts—about the same as a small city. At current prices, the electric bill alone for just one machine could top $500 million per year, says Richard Murphy, computer architect at Sandia National Laboratories.
To avoid that undesirable future, Murphy is leading one of four teams developing energy-efficient supercomputers for the Ubiquitous High-Performance Computing program organized by the military’s experimental research division, the Defense Advanced Research Projects Agency, or Darpa. Ultimately the agency hopes to bring serious computing power out of giant facilities and into field operations, perhaps tucked into fighter jets or even in Special Forces soldiers’ backpacks.
The program, which kicked off last year, challenges scientists to construct a petaflop computer by 2018 that consumes no more than 57 kilowatts of electricity—in other words, it must be 40 percent as fast as today’s reigning champ, while consuming just 1 percent as much power.
The teams that survive the initial design, simulation, and prototype-building phases may earn a chance to build a full-scale supercomputer for Darpa. Making the cut will demand a total rethink of computer design. Nearly everything a conventional computer does involves schlepping data between memory chips and the processor (or processors, depending on the machine). The processor carries out the programming code for jobs such as sorting email and making spreadsheet calculations by drawing on data stored in memory. The energy required for this exchange is manageable when the task is small—a processor needs to fetch less data from memory. Supercomputers, however, power through much larger volumes of data—for example, while modeling a merger of two black holes—and the energy demand can become overwhelming. “It’s all about data movement,” Murphy says.
The competitors will share one basic strategy to make this back and forth more efficient. This technique, called distributed architecture, shortens the distance data must travel by outfitting each processor with its own set of memory chips.They will also incorporate similar designs for monitoring energy usage.
Beyond that, the teams will pursue different game plans. “There’s competition as well as collaboration,” says Intel project leader Wilfred Pinfold, “and there won’t be just one answer.”
Sandia National Laboratory’s effort, dubbed X-caliber, will attempt to further limit data shuffling with something called smart memory, a form of data storage with rudimentary processing capabilities. Performing simple calculations without moving data out of memory consumes an order of magnitude less energy than today’s supercomputers. “We move the work to the data rather than move the data to where the computing happens,” Murphy says.
Intel’s project, called Runnemede, is wringing more efficiency from its system using innovative techniques that selectively reduce or turn off power to individual components, says Josep Torrellas, a computer scientist at the University of Illinois who is an architect with the team. He and his colleagues are designing chips with about 1,000 processors arranged in groups whose voltage can be controlled independently, so that each group receives only what it needs at a given moment.
Graphics chip maker NVIDIA leads a third research thrust, called Echelon, which builds on the capabilities of the company’s graphics-processing chips. Such processors consume just one-seventh as much energy per instruction as a conventional processor, according to architecture director Stephen Keckler. The graphics chips efficiently execute many operations at once, in contrast to traditional processors that perform one at a time as quickly as possible. The Echelon team plans to combine its graphics processors with standard processors so that their computer can automatically choose the most appropriate combination for the task at hand.
Finally, the Angstrom project, based at MIT, is creating a computer that self-adjusts on the fly to reduce energy use. The system goes through a search process to optimize settings such as the number of processors in use, says Anant Agarwal, the MIT computer scientist who heads the project. In a computing first, it will even be able to automatically select algorithms based on their energy efficiency, he says. This self-regulation should help make life easier for software engineers working with the machine. “Other approaches often require programmers to worry about optimizing performance and energy use simultaneously, which is awfully hard to do,” Agarwal says.
Though the Darpa challenge focuses on supercomputers, the technology it spawns will probably ripple throughout the industry, making its way into data centers, automotive computers, and cell phones. Today’s desktops rival the top supercomputers of the late 1980s; 2020 may find us using laptops that outperform Tianhe-1A. And if Darpa’s four ultraefficient developer teams succeed, maybe we can even leave the chargers at home.
Buzz Words
Floating point operations per second, a standard measure of computing power.
Exascale computing Supercomputing three orders of magnitude above the current frontier, with quintillions of calculations per second.
Smart memory A form of data storage with its own computing capabilities. Such memory reduces the need to move data to a processor.
Distributed architecture A multiprocessor computer system in which each processor has its own dedicated set of memory chips.