The puzzle of information packing—how to cram knowledge into the tiniest space possible—has fueled technological development at least since the emergence of Chinese letters 8,000 years ago. In his new book, The Information, journalist James Gleick argues that information is “the blood and the fuel, the vital principle” of our lives. Delving deep into the history behind today’s data-driven world, Gleick explores the mysterious drumming language of the African talking drum, whose irregular rhythms carried messages through the jungles of the Congo. He considers musical compositions like Johann Sebastian Bach’s 18th-century “Well-Tempered Clavier” as data streams that could capture sounds as varied as wind, cricket chirps, or the clatter of a horse-drawn cart. But for Gleick the pivotal moment initiating our data-drenched era came in 1948, when mathematician Claude Shannon conceived of the bit as a unit of information. Shannon’s work propelled us headlong into the flood of blogs, emails, tweets, and news updates that shape our lives today.
In 1948 the Bell Telephone Laboratories announced the invention of a tiny electronic semiconductor, “an amazingly simple device” that could do anything a vacuum tube could do and more efficiently. It was a crystalline sliver, so small that 100 would fit in the palm of a hand. In May scientists formed a committee to come up with a name. Transistor won out. “It may have far-reaching significance in electronics and electrical communication,” Bell Labs declared in a press release, and for once the reality surpassed the hype. The transistor sparked the revolution in electronics, setting the technology on its path of miniaturization and ubiquity. But it was only the second-most significant development of that year. The transistor was only hardware.
An invention even more profound and more fundamental came in a monograph spread across 79 pages of The Bell System Technical
Journal in July and October. No one bothered with a press release. It carried a title both simple and grand—“A Mathematical Theory of Communication”—and the message was hard to summarize. But it was a fulcrum around which the world began to turn. Like the transistor, this development also involved a neologism: the word bit, chosen
in this case not by a committee but by the lone author, a 32-year-old named Claude Shannon. The bit now joined the inch, the pound, the quart, and the minute as a determinate quantity—a fundamental unit of measure.
But measuring what? “A unit for measuring information,” Shannon wrote, as though there were such a thing, measurable and quantifiable information.
In 1949, when Claude Shannon took a sheet of paper and penciled his outline of the measures of information, the scale went from tens of bits to hundreds to thousands, millions, billions, and trillions. The transistor was one year old and Moore’s law yet to be conceived. At the top of his information pyramid was Shannon’s estimate for the Library of Congress—100 trillion bits, 1014. He was about right, but the pyramid was growing.
After bits came kilobits, naturally enough. After all, engineers had coined the word kilobuck—“a scientist’s idea of a short way to say ‘a thousand dollars,’ ” The New York Times helpfully explained in 1951. The measures of information climbed up an exponential scale, as the realization dawned in the 1960s that everything to do with information would now grow exponentially. That idea was casually expressed by Gordon Moore, who had been an undergraduate studying chemistry when Shannon jotted his note and found his way to electronic engineering and the development of integrated circuits. In 1965, three years before he founded the Intel Corporation, Moore was merely, modestly suggesting that within a decade, by 1975, we would be able to combine as many as 65,000 transistors on a single wafer of silicon. He predicted a doubling every year or two—a doubling of the number of components that could be packed on a chip, but then also, as it turned out, the doubling of all kinds of memory capacity and processing speed, a halving of size and cost, seemingly without end.
Kilobits could be used to express speed of transmission as well as quantity of storage. As of 1972 businesses could lease high-speed lines carrying data as fast as 240 kilobits per second. Following the lead of IBM, whose hardware typically processed information in chunks of eight bits, engineers soon adopted the modern and slightly whimsical unit, the byte. Bits and bytes. A kilobyte, then, represented 8,000 bits; a megabyte (following hard upon), 8 million. In the order of things as worked out by international standards committees, mega- led to giga-, tera-, peta-, and exa-, drawn from Greek, though with less and less linguistic fidelity. That was enough, for everything measured, until 1991, when the need was seen for the zettabyte (1,000,000,000,000,000,000,000) and the inadvertently comic-sounding yottabyte (1,000,000,000,000,000,000,000,000). In this climb up the exponential ladder, information left other gauges behind. Money, for example, is scarce by comparison.
After kilobucks, there were megabucks and gigabucks, and people can joke about inflation leading to terabucks, but all the wealth amassed
by all the generations of humanity does not amount to a petabuck.
The 1970s were the decade of megabytes. In the summer of 1970, IBM introduced two new computer models with more memory than ever before: the Model 155, with 768,000 bytes of memory, and the larger Model 165, with a full megabyte, in a large cabinet. One of these room-filling mainframes could be purchased for $4,674,160.
By 1982 Prime Computer was marketing a megabyte of memory on a single circuit board, for $36,000. When the publishers of the Oxford English Dictionary began digitizing its contents in 1987 (120 typists; an IBM mainframe), they estimated its size at a gigabyte. A gigabyte also encompasses the entire human genome. A thousand of those would fill a terabyte. A terabyte was the amount of disk storage Larry Page and Sergey Brin managed to patch together with the help of $15,000 spread across their personal credit cards in 1998, when they were Stanford graduate students building a search-engine prototype, which they first called BackRub and then renamed Google. A terabyte is how much data a typical analog television station broadcasts daily, and it was the size of the United States government’s database of patent and trademark records when it went online in 1998. By 2010 one could buy a terabyte disk drive for a hundred dollars and hold it in the palm of one hand.
The books in the Library of Congress represent about 10 terabytes (as Shannon guessed), and the number is many times greater when images and recorded music are counted. The library now archives websites; by April 2011 it had collected 160 terabytes’ worth.