Ever since people have been able to write they have wanted to send each other messages that cannot be read by everyone else. If someone comes up with a “secret language” and explains it to only one other person, these two people can write letters to each other in the secret language that no one else can read. But, what if the secret language is not as secret as its inventors hoped?
Study of such secret languages is addressed in cryptology. It essentially consists of cryptography, which deals with encryption and decryption of messages, and cryptanalysis, which attempts to recover the original message from encrypted messages without knowing the secret language. The prefix “crypto-” comes from the Greek term kryptós, which means “secret.”
A cryptosystem consists of two computational rules (algorithms). The first algorithm, the encryption, converts a message (“plaintext”) into an encrypted message (“cipher text”). This usually requires a secret key, such as a password. The second algorithm carries out the decryption, retrieving the plaintext from the cipher text and secret key. Anyone who knows the method and the secret key can use them to encrypt and decrypt messages. That is why it is important that the secret key remains truly secret.
Many encryption methods operate with numbers rather than letters. At the start of the encryption process, a sequence of numbers must therefore be generated from a sequence of letters, the plaintext. This process is also called encoding. Although it may at first sight appear the same, encoding is not yet encryption.
Here’s one simple method: the letter “A” is assigned the number 0, the letter “B” the number 1, and so on:
In computers, texts are represented by sequences of numbers between 0 and 255. Those are exactly the numbers that can be expressed by a byte. Nowadays, an encoding called UTF-8 is mostly used. Upper- and lower-case letters are mapped as follows in this system:
In this encryption method, the letters in the alphabet are shifted by three places: An “A” becomes a “D”, a “B” becomes an “E”, and so on. Finally, at the end of the alphabet, you start over again: while “W” becomes “Z”, “X” becomes “A” and “Y” becomes “B”.
This method worked well in Caesar's day, when almost no one could read and write. Nowadays, however, it is very insecure.
With a substitution cipher, also called monoalphabetic substitution, certain letters are stipulated in a secret key as matching other letters. Each letter of the plaintext is subsequently encrypted using these pre-established letter pairs. The Caesar cipher is a simple example of such a substitution cipher. Another example is the following allocation of letters:
In total, 403,291,461,126,605,635,584,000,000 possible options exist for such letter swaps. That makes it unfeasible to simply try all the possibilities. This encryption was widely used in Europe in the Middle Ages and was considered unbreakable. However, in the Arab world people had known since the ninth century that such methods could be attacked quite easily. The Arabs had realized that not all letters occur in texts with the same frequency. Some letters, such as “E”, are found very frequently, while others, such as “Q”, occur very rarely.
Letter frequencies depend on the language. In German, for example, the distribution looks like this:
That means that counting which letters occur most often in the encrypted text often makes it possible to guess fairly rapidly some of the letter matches used. Once the first few letter correspondences have been established, more can generally be worked out pretty quickly.
There are many strategies to get round the problem of letter frequency. The most important is probably opting not to use the same substitution for every letter.
This can also be described with the encoding mentioned above: to add two letters, for example “M” and “U”, each is written as a number between 0 and 25. “M” becomes 12, “U” becomes 20. Next, those two numbers are added together: 32. If the figure obtained is 26 or higher, subtract 26. That turns 32 into 6. Through the encoding step, that number is expressed as the letter “G”. It is, however, much easier to do this with the above table.
But, what happens if the secret key text is shorter (or longer) than the plaintext? Simply repeat it (or omit the surplus part). For example, the table below shows how to encrypt "SECRETTEXT" with the key text "WORD":
Decryption works along the same lines:
Subtraction is needed for decryption. That is not so straightforward using the tabula recta shown above, but a corresponding table can be set up for subtraction:
Unfortunately, as it turned out, this procedure is not secure either. Mathematician Charles Babbage (1791–1871) and infantry officer Major Friedrich Kasiski (1805–1881) both managed to crack the code independently. Although Babbage found the solution years before Kasiski, he never published it and it was discovered in his estate. These attacks take advantage of the generally short secret text, which often consists of just one word.
In his 1883 text La Cryptographie militaire, Dutch linguist and cryptologist Auguste Kerckhoffs established rules for confidential communication. The most important of these rules is that “the system must not require secrecy.” This rule is often referred to as Kerckhoffs's principle.
The rule states that the security of an encryption method must not depend on the method per se, but only on the key used. When analyzing a method, the assumption must always be made that the attacker has detailed knowledge of the method.
Modern methods such as the Advanced Encryption Standard (AES) adhere to this principle. The AES has been around since 2001, while the underlying method was made public as early as 1998. Since then, many researchers have grappled with this algorithm – but no one has yet been able to break the encryption method. That is one of the reasons why it is considered secure. There are also a number of methods that did not adhere to this principle and subsequently proved insecure. Well-known examples include the encryption of movies on DVDs (Content Scramble System, or CSS for short) and the A5/1 and A5/2 encryption methods used in the GSM (“2G”) mobile communications standard introduced in 1990.
It is also the first method that can be shown to be absolutely secure. However, it must use purely random key texts rather than text in a human language. The requirements that a key text is never used twice and that it is purely random are essential. If users do not stick to those rules, the method can be cracked.
One case where this proved devastating was in what is known as the Venona project. During the Second World War and the subsequent Cold War, one-time pads were one of the methods used to protect communications between Soviet embassies and Moscow – but, the key texts were sometimes reused in a slightly altered form. That allowed American and British intelligence services to decipher some of these communications.
While the one-time pad is absolutely secure, it is also very unwieldy. A large enough number of sufficiently long key texts must be exchanged in advance with all communication partners. Any errors can rapidly have a devastating impact – as in the case of Venona. That is why this procedure is rarely used. The best known example is probably the “hotline” between Moscow and Washington during the Cold War.
For a long time, both encryption and decryption were tedious and time-consuming tasks. That is why mechanical and electrical tools were invented to handle all or part of the encryption and decryption. The most famous is Enigma, which was used by the Germans to encrypt communications before and during the Second World War.
Enigma’s encryption was cracked even before the outbreak of the Second World War thanks to cooperation between French, Polish, and British spies. Drawing on documents that a French spy had obtained from Germany, Polish mathematicians made a major breakthrough: They managed to decipher the first Enigma ciphers. The “bomba,” a machine that automatically tried out many possible configurations, played an important role in this process.
This success however proved short-lived, as the Germans steadily improved Enigma's encryption. But decryption grew increasingly refined too, thanks mainly to British cryptographers and mathematicians as well as Polish mathematicians who had now fled to France. Turing’s “bombe,” an improved version of the Polish “bomba,” played a key role in deciphering the messages.
The “bomba” and Turing’s “bombe” are thus the first devices used for large-scale automation of some aspects of cryptanalysis.
Modern encryption methods use bits rather than the letters of the alphabet. All data in computers is stored as sequences of zeros and ones, known as bits. Encryption methods use a sequence of bits as input data and then obtain a secret key, likewise as a bit-sequence, and use it to calculate the encryption, which is also a sequence of bits. The operations involved can be carried out rapidly by a computer, whereas it would be a very lengthy process if done manually.
One of the methods deployed most widely today is the 2001 Advanced Encryption Standard (AES), which emerged from a competition. Numerous research teams proposed candidates for an encryption method, which were publicly examined in several rounds. As the rounds progressed, more and more candidates were eliminated. Ultimately, the winner was the “Rijndael” algorithm developed by Belgian cryptographers Joan Daemen and Vincent Rijmen.
Asymmetric encryption methods are a very exciting innovation that would not be feasible in practice without computers. In classical, symmetric methods, both parties must agree on a secret key known only to them. In asymmetric methods, each party has two keys: a secret key that no one else may know and a public key that anyone can know.
Prof. Dr. Joachim Rosenthal, Institut für Mathematik, UZH
Dr. Felix Fontein
Jonas Voegeli, Visual Communication, ZHdK
Hubertus Design: Valentin Kaiser, Kerstin Landis, Nathan Meyer, Jonas Voegeli, Zurich
We'd like to thank Mathmos: inventors of the lava lamp for the donation of 100 lava lamps.