This Man Wants To Control the Internet

And you should let him...

By Carl Zimmer|Thursday, October 25, 2007
RELATED TAGS: COMPUTERS, MATH
doyle2
doyle2
Photograph by Trujillo/Paumier

John Doyle is worried about the Internet. In the next few years, millions more people will gain access to it, and existing users will place ever higher demands on our digital infrastructure, driven by applications like online movie services and Internet telephony. Doyle predicts that this skyrocketing traffic could cause the Internet to slow to a disastrous crawl, an endless digital gridlock stifling our economies. But Doyle, a professor of control and dynamic systems, electrical engineering, and bioengineering at Caltech, also believes the Internet can be saved. He and his colleagues have created a theory that has revealed some simple yet powerful ways to accelerate the flow of information. Vastly accelerate the flow: Doyle and his colleagues can now blast the entire text of all the books in the library of Congress across the United States in 15 minutes.

I travel to Pasadena to learn about Doyle’s work and am a bit flummoxed by his suggestion to meet not in his lab but in his gym. Doyle gets on a treadmill and begins to pound away. I turn on my machine and try to keep up. At 53, Doyle is long, lean, and hawk-faced. He is a championship athlete, and he works out furiously twice a day at least. It’s hard for me to catch enough air to ask Doyle questions, but he has no problem holding forth in response. As he talks I begin to realize his secret agenda in meeting me here. A treadmill is the perfect place to start to understand his ideas, because for Doyle, the world is filled with complex networks—and a body in the middle of a workout is a very good example of what those networks are all about.

A system of linked computers like the Internet is obviously a network, but so are jetliners, human bodies, and even bacterial cells. They’re all networks because they are made up of lots and lots of parts that work together. Robust networks have parts that continue to work together smoothly even if conditions fluctuate unpredictably. In the case of the Internet, a million people may try to send e-mail at once. In the case of Doyle’s body, here on his treadmill, its physiology holds steady even as he pushes himself to his limit. “Inside of you, everything’s going crazy,” Doyle says, “but it’s all keeping your body temperature steady and your body upright.”

Doyle knows, however, that networks that look perfectly sound can be headed for collapse with little warning. He has found that in order to achieve robustness, all systems must follow certain rules. Robustness doesn’t come cheap. As a system is tuned to become robust under one set of conditions, that tuning makes the system fragile under other, sometimes unexpected, conditions. Robustness and fragility go hand in hand. While Doyle pounds away on his treadmill, he offers his own body as exhibit number one. He has optimized his body to meet the grueling challenges of winning triathlons, but in doing so he has made his body vulnerable to problems that rarely plague a nonathlete. He has bad ankles, a groin injury, and other injuries earned over a lifetime of playing sports. In August, he almost died after falling down a rock face while hiking in Panama.

The treadmill slows to a stop. Doyle checks his pulse. “One of the reasons I’m so interested in robustness,” he said, “is that I’m so fragile.”

Doyle came to MIT in 1975 and fell in love with a science known as control theory. Control theorists, roughly speaking, try to understand how complicated things can run efficiently, quickly, and safely instead of crashing, exploding, or otherwise grinding to a halt. They analyze systems by modeling the variables that dictate how they will behave. But rather than checking through every possible combination of variables to see if, say, a plane will fly straight or stall when the wind picks up, control theorists look for underlying laws of control that can predict how something will behave using just a few key variables. “Control theory is at the center of modern technology,” Doyle explains.

As technology has become more complex over the past century, researchers have had to find new ways to control airplanes, factories, computers, and the like. Much of that progress has come by brute-force tinkering, but a lot of it has come from a growing understanding of the basic laws of control. Doyle began developing his own innovative ideas in control theory as an undergraduate. By 1976 he was consulting for Honeywell. By age 32 he had been hired with immediate tenure at Caltech.

Doyle made his mark by figuring out how to prove a system is robust. In the early 1980s, NASA asked him to look at the space shuttle. Several shuttles had already flown, but the agency wanted reassurance about their behavior during reentry. Using wind tunnels and computer simulations, NASA had come up with apparently stable designs, but there were too many variables to test everything. “You had this obscenely large space of possibilities,” Doyle says. “Somewhere lurking in there could be a crash, and you don’t know.”

Doyle looked at all the forces that might be exerted on a space shuttle due to atmospheric conditions, its velocity through the air, and so forth. NASA’s engineers had plotted these forces in a so-called multidimensional space—say, a pitching torque along one axis and a longitudinal acceleration along another. By developing new mathematical tools, Doyle proved that there was a volume of this multidimensional space, inside of which every combination of forces was certainly safe. Outside that region lurked disaster. The space shuttle design was lodged comfortably inside the safe region.

“Not only could we show it was safe, we could prove it,” Doyle says.

The techniques Doyle developed to test the space shuttle have become standard tools for testing new designs of airplanes and helicopters. But Doyle had a more fundamental question on his mind. Just how big could the volume of safety be made?

It is certainly possible to make things more robust—in other words, expand the safe region. A jetliner is far more robust than the world’s first airplane, the 1903 Wright Flyer. Doyle took up a question that had concerned researchers since the birth of control theory in the 1930s and ’40s: Were there any fundamental limits to the growth of robustness? He focused on one of the most important ways in which engineers make things more robust: by adding feedback loops. A jet can keep track of its movement, temperature, and a long list of other readings, and it can continually correct every one, adjusting itself to bring variables back into line. But Doyle showed how just cranking up robustness under some conditions creates new opportunities for failure. A jet is far more stable in high winds than the Wright Flyer, but on the other hand, it is vulnerable to software bugs that the Wright brothers never had to worry about. “You replace mechanical failure with lots of software failure,” Doyle says.

In the 1990s, studying complex systems of all sorts became something of a fad following the emergence of “chaos theory.” Competing versions of this theory were emerging left and right; chaos was being touted as the science of the future. Doyle was unimpressed by most of the new ideas. “It was clear to me that they were just so far off the mark,” he says. Doyle made up a name that combined all the trendy buzzwords he came across: “emergilent chaoplexity.”

One reason that Doyle loathes emergilent chaoplexity is because it relies on superficial patterns. Doyle, by contrast, insists that his analyses draw from the gritty details of how things actually work.

As an example, Doyle points to what are known as scale-free networks. Many of these networks—interlinked sets of airports, friends, nerves in the body, and so on—have the same basic structure. A few nodes are highly connected hubs, while most other nodes have only a few connections. Any given small city airport probably connects to just a few others. Passengers rely on being able to transfer at a hub to reach most other places. But if you live in Chicago, you can take a direct flight from O’Hare Airport to hundreds of destinations.

Some researchers, like Albert-László Barabási at the University of Notre Dame, have argued that the Internet shares a similar structure and that this accounts for why the Internet keeps humming even when some of its systems fail. Since hubs are rare, failures involving them are even rarer. But should a hub fail, researchers warned, it would lead to catastrophe. Their warning made headlines, with CNN reporting in 2000: “Scientists Spot Achilles’ Heel of Internet.”

Doyle was not impressed. “Everybody who knew how the Internet worked was puzzled by all this,” he says. He decided to test the Achilles’ heel theory by joining up with a group of collaborators and mapping a section of the Internet in unprecedented detail.

In that map, they found no Achilles’ heel. The Internet does have a few large servers at its core, but those servers are actually not very well connected. Each one has only a few links, mainly to other large servers through high-bandwidth connections. Much of the activity that occurs on the Internet actually lies out on its edges, where computers are linked by relatively low-bandwidth connections to small servers; think about how many e-mails office workers send to people in their building compared with how many they send overseas. If one of the big links at the core of the Internet crashed, Doyle and his colleagues discovered, it would not take the Internet down with it. Traffic could simply be rerouted through other big links.

The Internet works spectacularly well, despite the fact that over the past 30 years it has expanded a million-fold, absorbing new technology from BlackBerries to the iTunes music store with hardly any major changes to the basic rules it uses to move data. Doyle now knows why. It’s not just the physical arrangement of cables and servers that makes the Net so robust. Doyle and his colleagues showed that the software that runs the Internet uses feedback, in much the same way a jetliner’s computer does. The Internet can sense changing conditions and adjust itself.

The Internet has two kinds of feedback. It maintains a constantly updated picture of the entire network so that messages can be directed along the fastest routes. It also breaks down those messages and encapsulates them inside standardized packets of data, a little like using the standardized waybills and boxes provided by FedEx. Each packet can take its own path through the Internet. As packets arrive at the recipient’s computer, the message fragments in each packet are extracted and reassembled. Critically, as each packet arrives, it sends back a receipt to the sender’s computer. In heavy traffic, some packets get lost. In response to lost packets, computers slow down the rate at which they send their data, reducing congestion.

Together, these two types of feedback give the Internet a robustness more powerful than anyone anticipated. “These Internet engineers weren’t control theorists, but they built this incredibly robust network,” Doyle says. “Man, that’s awesome.” Then again, the engineers were doing something that evolution figured out long ago.

Back at Doyle’s messy Caltech office, he props his robust yet fragile body in a recliner by his desk and shifts the conversation from technology back to biology. Around the time Doyle began to use control theory to understand the Internet, he also began using it to explore the mechanism of life. If his ideas about control really were universal, he realized, then a cell ought to share some basic organizational principles with an airplane or the Internet—although finding the similarities might require some digging. “If you want to understand how airplanes fly, looking at birds helps, but you may end up thinking it’s all about flapping,” he says. “If you look at bats and insects, too, you’ll see how it’s lift and drag and things like that. You use them to understand the deep stuff.”

Control theorists have pondered living things for decades, but until recently they lacked the mathematical tools to analyze them as they would a technological system. Doyle and his colleagues have created some of those tools. In keeping with Doyle’s gritty real-world philosophy, he then set out to see how they applied to a common bacterium, Escherichia coli. He soon discovered remarkably precise parallels between living networks and technological ones.

When E. coli is heated to dangerous temperatures, for example, it can rapidly churn out thousands of heat-shock proteins, molecules that help protect the microbe’s workings. When the temperature falls, the heat-shock proteins quickly get dismantled. Doyle demonstrated that this behavior takes place through a series of feedback loops inside the bacterium, akin to the feedback loops that keep an airplane on autopilot steady even as the plane is buffeted by gusts.

Doyle is now tackling a far bigger network of genes in E. coli: the master network responsible for governing its metabolism. He and his team are probing the control systems that allow the microbe to eat many different kinds of sugar and transform them into the thousands of molecules that make up the bacterium. E. coli’s metabolism is nothing if not robust, able to easily withstand significant environmental fluctuations.

The reason the bacterium works so well, Doyle finds, is that it is organized in much the same way as the Internet. Both the Internet and E. coli are conceptually organized like a bow tie, with a broad fan of incoming material flowing into a central knot and then flowing into another broad fan of outgoing material. On the Internet, the incoming fan is made up of data from a huge range of sources— e-mail, YouTube videos, Skype phone calls, and the like. In E. coli, the incoming fan is made up of the many sorts of food it eats. As information and food move into their respective bow ties, they get homogenized: E. coli breaks down its food into a few building blocks, while the Internet breaks down its motley incoming data streams into streams of standardized packets.

From the knot, both bow ties then fan out. E. coli turns its building blocks into DNA, proteins, membrane molecules, and any other special ingredient it needs. On the Internet, data packets reach a computer, where they can be reassembled into the original e-mail, YouTube videos, Skype telephone calls, and the like.

A bow-tie organization allows both the Internet and E. coli to run quickly and efficiently. If E. coli (like all bacteria, indeed like all living things) did not have a bow tie, it would have to use a different set of enzymes to make each of the thousands of different molecules it needs from each type of food. Rather than use such a huge, slow system, E. coli just points all its metabolic pathways into the same bow-tie knot, making everything from the same raw materials. Likewise, the Internet’s bow-tie architecture means that it doesn’t have different ways to handle, say, e-mail traffic and instant-message traffic. Everything passes through as the same types of data packets.

The bow-tie architecture also makes both the Internet and E. coli robust. If the type of incoming material changes rapidly—say, a surge in video traffic in the Internet’s case, or a new food source for the E. coli—the system can process that material without having to retool its entire metabolism to cope.

Another advantage of a bow tie is that it makes feedback control easy. Information travels back from a receiving computer to the sender, which can speed up or slow down its packets in response. E. coli’s metabolism is loaded with analogous feedback loops. Normally E. coli can synthesize all the amino acids it needs for making proteins. But if it can get a certain kind of amino acid from the environment, that information shuts down its own production line.

But as Doyle points out, improving robustness comes with a price. The bow-tie structure opens the door to a vulnerability that could prove very hard to fix. Because of the homogenization that occurs at the heart of the bow tie, it’s difficult to identify and block harmful agents. In the case of the Internet, it takes only a short piece of code to produce a digital virus that can spread quickly to millions of computers and cause billions of dollars of damage. In living organisms, real viruses hijack cells in much the same way.

Doyle thinks the similarity between E. coli and the Internet is no accident. As networks get big and complicated—either through the tinkering of Internet engineers or through millions of years of evolution—they must follow certain rules to stay robust. “There is an inevitable architecture,” Doyle says.

Over dinner, Doyle muses on how to deal with these fundamental vulnerabilities. He hasn’t found a way to improve biological reliability (yet), but he does think he can help address the Internet’s limits.

The current packet-receipt feedback system (known as TCP) has worked wonderfully for years to control the flow of Internet traffic, but it won’t be able to cope with the coming jam, when fridges will scan the RFID chip on a milk carton and send an alert when the expiration date arrives. “Whether we like it or not, [Internet equipment giant] Cisco will network everything. Soon our glasses will tell the kitchen they’re empty,” Doyle says. That vast amount of traffic will make the Internet catastrophically fragile. “We could wake up one morning and nothing works.”

Many Internet experts are also worried, and they’ve launched several projects to save the network, including Steven Low, another Caltech professor. Doyle is working with Low on his project, which is unusual in its simplicity. Their plan to speed up the Internet is to simply do a better job of paying attention to measurements of Internet traffic. Today computers sense Internet congestion by noticing how many packets they lose. That’s like trying to drive down a highway by just looking at what’s 20 feet ahead of you, constantly accelerating and then slamming on the brakes as soon as you see something.

Doyle and his coworkers enable computers to use more information about traffic flow, noting how long it takes for their packets to get to their destination. The less traffic, the shorter the time, and with these traffic reports on hand, their computers make much smarter decisions. The result is a string of victories for high-speed Internet communication competitions. In the last face-off in 2006, they managed to send 17 gigabits—about a full-length movie’s worth—each second across the Internet. Doyle smiles as he describes their success, a flash of the athlete’s spirit in his face. “You’re not just proving theorems,” he says. “It beats anything anyone else can do.”

Last year the Caltech team started operating a company, FastSoft, to market their protocol. In March they started selling a box about the size of a DVD player that you can plug into a server. In one test, a Fortune 500 company was able to speed up its transmissions 30-fold. But Doyle stresses that a real solution to the Internet crisis will require rethinking the control process from the bottom up.

“If someone said, ‘Do a radical redesign,’ I’d say we’re not ready yet,” Doyle confesses. “Going to the moon was trivial compared to dealing with this. We’ve got a research path, but there’s some hard math to be done.”

Next Page
1 of 3
Comment on this article
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

ADVERTISEMENT
ADVERTISEMENT
Collapse bottom bar
DSC-JanFeb15
+

Log in to your account

X
Email address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it emailed to you.

Not registered yet?

Register now for FREE. It takes only a few seconds to complete. Register now »