Cryptography is the science of secrets. Literally meaning “secret writing”, cryptography is designed to hide information from all but its intended recipients. Modern cryptography is essential to the secure Internet, corporate cybersecurity, and blockchain technology.

However, cryptography has a very long history before the modern ciphers were invented. In this three-part series, we’ll explore the history of cryptography before the 20th century, in the 20th century, and in the modern day.

Caesar’s Box

Caesar’s Box is one of the earliest known ciphers. Developed around 100 BC, it was used by Julius Caesar to send secret messages to his generals in the field. Since messengers could easily be waylaid by the enemy en route, the use of even a simple cryptographic algorithm to encode his orders and his generals’ responses could give him a significant strategic advantage: he could intercept and read his opponents’ messages but they can’t read his.

Caesar’s box is a particular implementation of a shift cipher (which is a specific type of substitution cipher). In Caesar’s Box, the encryption algorithm involved shifting each letter in the message three letters to the right to produce the ciphertext. For example, A became D, B became E, and X became A. Decryption involved reversing this process by shifting each letter of the ciphertext three steps to the left.

In Caesar’s Box, the secret step amount was three, but this isn’t the only option. Other shift ciphers (like ROT13) use different numbers of steps (like 13). However, all shift ciphers are insecure and can be easily broken.

Frequency Analysis: Breaking Caesar’s Box

It turns out that the hardest part of breaking Caesar’s Box is knowing the language of the message that it encodes. Once you know that, it’s easy to break the cipher using frequency analysis. If that fails, a brute force attack against the cipher can work, since there are only 26 possible shifts in English.

Frequency analysis attacks take advantage of the fact that all of the letters in the English alphabet are not used equally. E is the most commonly used letter, and you hardly ever see a word using z. In fact, this is the first time it appears in this article. The relative frequencies of each letter in the English language are shown in the graph below.

The frequencies of the letters usage in English is important because Caesar’s Box does nothing to change them. In a ciphertext encrypted with Caesar’s box, the most common letter is likely to be h.

Knowing this, it’s easy to determine the shift factor for Caesar’s Box or any other shift cipher out there. With this information, the rest of the ciphertext can be easily decrypted. In the event that the shift factor is incorrect (a sentence may have more t’s than e’s for example), there are few enough options for a shift cipher that it’s easy to try them all.

Vigenere’s Cipher

Caesar’s Box may be the first famous cipher in existence, but it’s missing something rather important for modern ciphers: a secret key. While the number of steps to shift may be considered the secret key for a general shift cipher, this value is hardcoded as three for Caesar’s Box. As a result, the security of Caesar’s Box relies on the fact that no-one knows how it works, a practice called “security through obscurity” that violates Kerckhoff’s Law.

Vigenere’s cipher was created in the 16th century and introduced the concept of a secret key. In Vigenere’s cipher, the secret key is another word or phrase that may be shorter than the plaintext to be encrypted. If this is the case, the key is repeated until it matches the length of the plaintext.

To encrypt using Vigenere’s cipher, you convert the letters in the plaintext and the key to numbers in the range 0-25, add each pair of numbers together, and calculate the result modulus 26 (the result of dividing by 26 and keeping the remainder). The output of this calculation is mapped back to a letter and used as a character of the ciphertext.

The image above shows a lookup table for performing encryption with the Vigenere cipher. The columns are the letters of the secret key and the rows are the letters of the corresponding letter of plaintext. Their intersection is the letter of ciphertext created from a given pair of plaintext and key letters.

Cryptanalysis of Vigenere’s Cipher

Like Caesar’s box, decryption of the Vigenere cipher can be performed using frequency analysis. However, it takes a little more work. In order to decrypt Vigenere’s cipher, it’s necessary to first determine the period of the cipher and then apply frequency analysis.

The period of the Vigenere cipher is the length of the secret key used for encryption. For example, encryption of a 36 letter plaintext with the key CIPHER would actually use a key of CIPHERCIPHERCIPHERCIPHERCIPHERCIPHER.

Once you know the period of the cipher, it’s possible to decrypt it just like a Caesar cipher. Note from the image above that each particular letter of the key just creates a shift of a certain amount. If you know the letters of the ciphertext that used the same shift value, you can apply frequency analysis to the cipher.

This is why the period of the cipher is important. Note that the first, seventh, etc. letters of the sample key are all C (a shift of 2). Through either guess and check or using calculation of the Index of Coincident, it’s possible to determine the actual period, shift values, and plaintext of the cipher.

Coming Up: 20th Century Ciphers

Caesar’s Box and the Vigenere cipher are two of the earliest known ciphers. They pioneered the use of encryption to protect sensitive communications data and the use of a secret key in encryption. In the next post in this series, we’ll move forward to the 20th century. There, we’ll see how cryptography evolved when driven both by military interests and organizations protecting their intellectual property.