In a previous post I described mathematicians’ ongoing search for key properties of prime numbers. That effort may seem to belong entirely within the realm of pure mathematics; but surprisingly, the importance of primes goes far beyond the abstruse obsessions of ivory-tower mathematicians. In fact, the use of prime numbers underlies some of the most dramatic events in the news these past weeks: the story behind Edward Snowden’s revelations that the National Security Agency (NSA) is snooping on the communications of both American citizens and European diplomats.
While the Europeans have protested about their internal communications being intercepted by the NSA—ironically—the tools that one can use for protection from spying by anyone are readily accessible online, in the professional literature, and in publicly-available manuals and textbooks. These methods all rely on clever uses of prime numbers.
The essentials of these techniques are far from new. The foundations of a program to create codes so powerful that they could not be broken even if an eavesdropper were to use the entire available worldwide computing power were laid more than 35 years ago. The year 1976 saw the development of the Diffie-Hellman key exchange method (named after Whitfield Diffie and Martin Hellman; the names Ralph Merkle, James Ellis, Clifford Cocks, and Malcolm Williamson are often also associated with it); and the following, 1977, witnessed the appearance of the RSA algorithm. Both methods have advanced over the past three and a half decades, but information about their extensions is also readily available to anyone.
How do these techniques work? I will explain both methods here—necessarily in a simplified way. (Those interested in learning more can read some of the articles in the links that appear throughout this post.)
Alice sends Bob a secret message
The Diffie-Hellman key exchange idea has been described in a clear and concise way using an analogy by Terence Tao, whose work on prime numbers I mentioned in my previous post. The idea is as follows. Alice wants to send Bob a secret message (cryptographers prefer to use “from Alice to Bob” instead of the mundane “from A to B”) and she wants to prevent Eve (the “eavesdropper”) from reading it. So Alice places the message in a box, puts a good lock on it, keeps the key, and sends the package to Bob. (If Alice were to separately send Bob the key, there would be a chance that Eve could intercept both the package and the key.)
Bob has no key to Alice’s lock. So what he does instead is to put his ownlock on the box. And he now sends the package back to Alice, locked twice: using both her lock and his. Alice gets the package, removes her own lock using her key, and then sends the box, still safe because it bears Bob’s lock, back to Bob. Now Bob uses his key, opens the box, and gets the message! Each person here used his or her own lock and key—and yet a message was passed perfectly safely from Alice to Bob.
The digital version
This idea is implemented digitally in the Diffie-Hellman key exchange. The message to be sent from Alice to Bob is a secret number, call it n. Alice’s “key” is an exponent, a, which she chooses, and then uses it to raise n to. So the “locked box with the message” that Alice sends Bob is na. Bob has his own “key,” which is a number of his own choosing, b, that he uses as an exponent. He doesn’t know n or a, but he has na, which he got from Alice, so he raises this number to the power b. He thus sends Alice the “box with the two locks”: nab. Alice’s using her own key to open her own lock means her taking the ath root of nab, which, from the simple math of exponents, we know gives her nb, which she now sends back to Bob. Using his “key,” his exponent b, Bob takes the bth root of nb, and he thus obtains the secret number n that Alice wanted to convey to him.
Creating stronger codes with primes
It is possible to send a secret number from Alice to Bob as I just described, and if the numbers are large enough, one would have a reasonable probability that the number might not be deduced by Eve. In actuality, however, modern implementations of the Diffie-Hellman key exchange use more sophisticated elements to make it more difficult to break the code. And the secret number is not sent from Alice to Bob, but rather deduced by both of them using the formula nab (which, of course, is also equal to nba).
Alice and Bob choose a prime number, which they assume can be known to Eve, or to anyone in the world. Let’s say that this number is 11. They then do all calculations using the mathematical multiplicative group of integers modulo 11 (like a clock going around to 12 and then starting from 1, this group starts to count again after reaching 11). They also choose a base, and let’s suppose it is the number 5. Alice then chooses her secret number, say 3. Independently, Bob chooses his secret number, 4.
Alice raises the commonly-agreed-on base of 5 to the power of her secret number 3, and does the calculation modulo 11. She gets: 53 = 125, but 125 modulo 11 is 4 (it’s the remainder of dividing 125 by 11, which gives 11 and a remainder of 4—it acts like 16 hours in a clock, but this clock is based on 11 rather than 12). She sends Bob the answer, the number 4. Recall that Bob had chosen a secret number of 4, so he raises the 4 he got from Alice to the 4th power, modulo 11, and this gives him 44 = 256, but 256 modulo 11 is 3 (because 11×23 = 253, leaving the remainder 3), which is his final answer.
Alice gets from Bob the original 5 they had both agreed on, but now raised to the power of his secret number, 4, modulo 11, which is 625 modulo 11, which is 9 (as 11×56 = 616, leaving a remainder of 9). She then raises this number to the power of her secret number of 3, again doing this calculation modulo 11. She gets the same number that Bob got, 3 (because 93 = 729, but modulo 11 it is 3, since 11×66 = 726, which leaves a remainder of 3).
Using this complicated modular arithmetic based on a prime number, but essentially raising a number to hidden powers as in the previous section, Alice and Bob establish a common secret number, in this example, 3. Modular arithmetic using prime numbers helps make the algorithm much more difficult to decipher by an eavesdropper.* In reality, the prime number is large, and so are the other numbers. When Alice and Bob use secret numbers 100 digits long, the common number jointly deduced by Alice and Bob cannot be learned by Eve even if she has access to all the world’s available computing power.
Once Alice and Bob have established a common secret number, they can use it as a key to encrypt messages from one to the other and should have a high probability that their communication will not be deciphered by an outsider.
Two keys are better than one
The year after the Diffie-Hellman algorithm was published, three academics then working at MIT—Ron Rivest, Adi Shamir, and Leonard Adelman—came up with a brilliant idea for encrypting messages. What they tried to do was to avoid the stage in which Alice and Bob must create a common secret number, since this stage slows down the communication between them.
The three MIT scientists developed the notion of a pair of keys: a public key and a private key, which are then jointly used for communicating secret messages. The public key can be published and known to all. Its use saves time. The private key is a secret that Bob keeps, allowing him to decipher coded messages from Alice (or from anyone who knows his public key). Bob publishes his public key, which is a large number. This number is obtained when he multiplies together two very large prime numbers, known only to him (they constitute his private key). When Alice wants to send Bob a secret message, she encrypts it using his known public key. But in order to decrypt the message, one would need to know Bob’s private key, which is the two prime numbers he had used to create his publicly-known key. Supposedly, only Bob can do this.
Encrypting and decrypting messages using the RSA algorithm is a complicated mathematical procedure that relies on modular arithmetic and prime numbers similarly to the way they are used in the description of the Diffie-Hellman system above. But it is more sophisticated so that it can allow deciphering using only the private key. The public key alone is useless for deciphering the RSA code.
The essential element of RSA is the fact that the public key is composed of the product of two very large unknown prime numbers. It so happens that factoring a number into its prime components is very difficult when the primes are large. (35 = 7×5, a product of two primes, is easy; but 46,324,637 = 5,881 × 7,877 is harder, and primes used in RSA encryption are much larger still.) It is this fact alone that keeps Eve in the dark. She knows the product of the two prime numbers—but she can’t easily (and hopefully not at all) deduce what the two primes are!
The RSA Challenge
Right after the RSA system was invented, Martin Gardner published in Scientific American an encrypted message and a large RSA number, with 129 digits, that was the product of two primes. He challenged his readers to break the code, offering a $100 prize. It took 17 years for the number to be factored and the message deciphered. This was a relatively short period of time—many had expected that it would take an exceedingly long time, and Rivest, Shamir, and Adelman had jested that it could take several “quadrillion years.” The complex operation was achieved using distributed computing with thousands of computers around the world performing parts of the general calculation—thus demonstrating the power of such an approach.
RSA Security, founded by the academics, has since published several similar numbers, and for a time there was a cash prize offered for their factoring into pairs of primes, which the company subsequently withdrew. By now, some of these challenges have been met by mathematicians using distributed computing. Here is one problem that is still outstanding, an RSA number with 210 digits, that has never yet been factored into two primes:
RSA-210 = 245246644900278211976517663573088018467026787678332759743414451715061600830038587216952208399332071549103626827191679864079776723243005600592035631246561218465817904100131859299619933817012149335034875870551067
Obviously, the larger the number to be factored, the longer the time needed to break it into a pair of primes. Beyond a certain length (in decimal digits), the RSA code becomes impregnable and therefore any message based on it undecipherable (in a reasonably finite length of time) by an eavesdropper. The RSA algorithm is widely used today in Internet security.
NSA’s uses and abuses of encryption
In adopting standards for encryption in the United States, and for exporting encryption products, the NSA has pushed for, and succeeded in implementing, legal limits on the size of the numbers used in RSA coding, so that—with its supercomputers—it would be able to decipher any message based on it. Presumably, the Europeans are not bound by these restrictions, and their cryptanalysts should have been able to easily devise an unbreakable RSA code (by choosing primes that are large enough) for use in routine European diplomatic communications as well as protecting their computers from hacking.
And as history has shown, supercomputers are less effective than wide-ranging worldwide distributed computing for breaking advanced codes—but by its very nature, the NSA could never employ the latter. On the other hand, the most recent revelations seem to indicate that one of the purposes of NSA searches is in fact to identify people or entities that use encryption in their communications. If so, all the more reason for the European governments to use established, Western, advanced codes, so as to set themselves apart from terrorist entities, whose codes would necessarily look different. This would actually help the NSA concentrate on identifying real threats rather than wasting resources on intercepting Brussels messages such as: “Pierre, Italian or Chinese for lunch today? Yours, Hans.”
Thus we find ourselves where we do now, in an arms race of encryption and decryption, a world in which pure mathematics plays the key role in helping invent better and better codes. As the codes become more sophisticated, so do the code-breakers, and the cycle perpetuates itself. What is so amazing is that codes that were considered absolutely unbreakable a few decades ago do become breached as the technology improves—but then again, those designing new encryption methods, on all sides, use ever more complicated math to keep a step ahead of their pursuers.
*There are two good reasons for using modular arithmetic. The first is that it acts as a many-to-one function, in the sense that many numbers, when divided by a prime, will give the same remainder—thus making Eve’s life much more complicated (she can’t uniquely reconstruct Alice and Bob’s secret numbers). Using the clock example, if she should overhear that a meeting is to take place at 1 o’clock, she couldn’t tell if it’s a.m. or p.m., or which day. The second reason is that it puts a cap on the size of numbers involved when using exponentials, since (by definition!) without modular arithmetic these numbers grow “exponentially,” and could make computations intractable.
Image courtesy Maksim Kabakou / Shutterstock