Trust Me, I'm Your Software

All very complex computer programs will, at some time, fail. How often? No one knows; the programs are too complex to test. So where should we use them? How about in planes, nuclear power plants, weaponry...

By Evan I. Schwartz|Wednesday, May 01, 1996
Three years ago Britain’s nuclear regulatory agency sounded an unusual alarm. In an internal report that was leaked to the press, the agency implied that the Sizewell B nuclear power plant, which was about to begin operating on the Suffolk coast of England, just 90 miles from London, might not be safe. The problem was not one of the usual ones, such as the plant’s evacuation procedure or the disposal of its hazardous nuclear waste. Rather the danger lay in software designed to manage the reactor in the event of an emergency.

At issue was the reactor’s primary protection system (PPS)-- 100,000 lines of computer code that have the critical job of shutting down Sizewell B should its temperature suddenly begin to climb or should other conditions pose a danger. If the PPS were to fail during such an emergency, one of Europe’s most densely populated regions would be at heightened risk of a nuclear meltdown. Not surprisingly, then, Nuclear Electric, the operator of the plant, had put the PPS software through a battery of tests. The results were not comforting: the software failed almost half of them.

Confronted with these startling results, Nuclear Electric officials reacted with equally startling hubris. They blamed the failures not on the software but on the tests themselves. They then declared the reactor to be safe. I’m prepared to say that the PPS delivers what we require of it, says engineer Paul Tooley, who is in charge of reactor protection. The chance of an accident is very small--incredibly small. Precisely how small, however, nobody can say. Plant operators have no way of proving that the software that runs the PPS meets any particular safety goal. Yet to take into account the uncertainty over the software, Nuclear Electric lowered its overall expectation of the plant’s performance from one failure every 10,000 years to one every 1,000 years. Even so, some critics voiced doubts about the validity of these figures; those doubts were still in the air last summer, when the nuclear agency gave the plant the green light to operate at full power.

This case, though alarming, is not the first in which software has played a suspiciously large role in public safety. Nor will it be the last. Increasingly, complex software is proving to be the weak link in many types of engineering systems, both big and small. Microchips controlled by software now translate the flick of a pilot’s wrist into the movement of a wing flap, slow a braking car on a slippery road, and switch a commuter train from one busy track to another. Software is becoming integral to the operations of the latest medical devices, communications networks, and weapons. And all this is happening even though engineers know full well that the most thoroughly tested computer software is susceptible to bugs, glitches, and viruses.

The good news is that technology is advancing at a magnificent pace, says Peter Neumann, an expert in computer-related risk at SRI International, a research firm in Menlo Park, California. The bad news is that guaranteed system behavior is impossible to achieve.

Thus, as people who must inescapably use software, we are faced with a dilemma: the technology is more widely applicable than ever, but it has grown so complex that there’s no way to simulate it exhaustively or test how it will perform in the real world. Although a computer program may indeed have a finite number of lines of code, it is like a piano, whose mere 88 keys can yield a virtually infinite number of melodies.

Given such complexity, it should come as no surprise that the list of software failures in safety-critical systems has been growing rapidly in recent years. In the mid-1980s a microprocessor-controlled cancer-therapy machine called the Therac-25, made by Atomic Energy of Canada, ran amok in several U.S. and Canadian hospitals, shooting overdoses of radiation into the bodies of at least six patients, killing or seriously injuring them. In its investigation, the U.S. Food and Drug Administration found that in several of the cases a certain rarely invoked combination of keystrokes caused the machine’s software to go haywire.

During the Gulf War, a software glitch was partly responsible for interfering with a Patriot missile’s radar and tracking system, according to U.S. Army investigators. The faulty software threw off the missile’s timing by one-third of a second, enough for it to miss an incoming Iraqi Scud missile that, on February 25, 1991, killed 28 soldiers and wounded 98 others in a barracks in Saudi Arabia.

In a major telephone outage in the summer of 1991, a software fix itself was the problem. It was supposed to enhance the features of a phone network megaprogram installed in computers around the country. But its maker, DSC Communications in Plano, Texas, failed to test the upgrade along with the overall system. It was only a minor change, so they assumed they could release it without testing it, says Neumann. They were wrong. An error introduced with the upgrade crippled the network’s ability to handle heavy congestion. The problem took weeks to find and disrupted phone service in Pittsburgh, San Francisco, Los Angeles, and Washington, D.C.

Then there is the steady stream of relatively minor glitches that make us wonder about the increasingly software-controlled world in which we live. In November 1994 a software bug within the Oregon TelCo Credit Union’s new automated-teller-machine network allowed thieves with a stolen credit card to bypass what should have been a daily withdrawal limit of a few hundred dollars. By making bogus deposits totaling some $800,000, they were able to make over 700 withdrawals amounting to $350,000 before they were caught. In September 1995 a software snag in Bell Atlantic’s telephone switches sent emergency 911 calls in Richmond, Virginia, to a customer named Rosa Dickson. For a frantic half hour, until the problem was fixed, she fielded the calls herself and passed messages to the police. Then, in December, a software glitch fouled up communications between two computers at the New York Stock Exchange, delaying for an hour the opening of trading on a Monday morning. By the time the exchange’s massive trading system was back up, a backlog of selling momentum contributed to the Dow Jones’s biggest single-day point loss in four years.

Of course, all engineering creations are bound to fail eventually. What makes software particularly worrying, though, is that its sheer complexity can mask errors inherent in its design. Such basic design flaws are easy enough to test for in relatively simple hardware, such as an airplane engine. Any problem that does subsequently occur will most likely be caused by the chance failure of one engine part or another. But engineers can lower the risk of these random failures by installing an identical backup system that automatically takes over if something goes wrong. Experience has proved that the chance of the backup failing at the same time is infinitesimal. Engineers call this property failure independence. It means, first, that they can find the failure rate by testing the engines in a lab under simulated conditions or observing them in actual flights; if the engine breaks down an average of once every 10,000 flights, then they can be almost certain that the chance of both the primary and the backup engine failing on the same flight is one in 10,000 times 10,000, or a very comforting failure probability of once every 100 million flights.

All our experience in physics points to the fact that we can isolate computers in separate boxes, and the failure rates would be independent, says Ricky Butler, a software design engineer at NASA’s Langley Research Center in Virginia. The rare exceptions to such hardware independence, he says, would be if both pieces of machinery were simultaneously struck by lightning or subjected to high-intensity radio interference.

But achieving the same sort of independence for software has been elusive. Software often fails because of faulty logic within the code itself or because of flaws in the original specifications, which are the technical descriptions of how the software is supposed to behave. Even if two programs designed from the same specifications are developed by separate programming teams, it is likely that the backup program will have the same flaws as the main one.

To illustrate the problem, computer scientist Nancy Leveson of the University of Washington took unclassified design requirements, obtained from Boeing, for a program that would use data from radar to shoot down enemy missiles and asked 27 programmers around the country to write it. When they were finished, she ran each program in a simulation using real radar data. Although most of the programs eliminated more than 99 percent of the missiles, they tended to fail in the same situations, on the same sets of input data. If a certain missile came in at a certain tricky angle, for instance, all 27 programs would fail to shoot it down.

These dual problems--complexity and the lack of independence-- make it all but impossible to come up with numerical reliability ratings for software. So even though they may make the public feel better, ratings such as those attached to the Sizewell B nuclear plant--whether they predict a failure every 10,000 years or every 1,000 years--don’t really mean much. They are not scientifically valid, Butler concludes. That’s why some people protest the use of software in live, mission-critical applications.

The best answer to the software reliability problem, according to Butler and a growing number of software safety experts, lies in a system of programming known as formal methods. Since software is too complex to test in the real world, Butler says, the only solution is to take steps to ensure that the software is designed right in the first place and to prove that it works mathematically.

Formal methods begin with using mathematical equations rather than English sentences when specifying what a program is supposed to do. For example, instead of the chief engineer’s simply telling a programmer to write a program that moves an aircraft’s wing flap along a perfect 90- degree angle, a formal specification would lay out perhaps 50 different mathematical equations needed to accomplish the task. By proving that the underlying logic of these equations is sound, a mathematician can show that the software will do what it is supposed to do, even before any code is written. When the programmers then finish writing the actual computer code, a validation program would make sure that the code does what the equations specify. In this way, formal methods can prove software is safe on a rational rather than an empirical basis, says Butler, even before the first real-world test is run.

Even the most formal of formal methods, however, may not be enough when software is extremely complex--as it often is for, say, programs involving communications or medical technology. Such systems, says Leveson, would require teams of designers spouting theorems for eternity. What’s needed is more attention to the root cause of most software failures: human error. In the Therac-25 case, she notes, engineers failed to investigate the software bugs after initial accident reports. Even after incidents of radiation overdose were known, they contented themselves with minor adjustments to the machine. If managers made a concerted effort to address this issue, she believes, these kinds of mistakes would occur less often.

Still, they will occur. Inevitably, the news will feature fiascoes like the Federal Aviation Administration’s project to build a new air traffic control network. Back in 1983 the FAA estimated that its Advanced Automation System would cost $2.5 billion and be completed by the turn of the century. By 1994, however, cost estimates had escalated to more than $7.5 billion, and the FAA was forced to halt development and go back to the drawing board. A report by the General Accounting Office concluded that the FAA underestimated the technical complexity’’ of the new systems, especially the software.

In the meantime, the nation’s air traffic control system is being upgraded in piecemeal fashion, with each part susceptible to its own array of software glitches. Last August, for example, a bug in a new program installed at a control center in Auburn, Washington, caused controllers on the ground to lose communications with pilots in the air for a full minute. An FAA spokesman said that the bug was traced to a single command in the computer code. Fortunately, no accidents occurred during the minute of downtime.

Those 60 seconds of silence, though, could convey a powerful message, should anyone care to listen. When the system failed, the controllers, because they couldn’t be sure that the built-in backup software wouldn’t fail also, intentionally took it off-line. They were right to do so. As it turned out, in a real-world illustration of the problem of independence, the backup turned out to have the same bug.
Comment on this article