# Unicity distance

2007-10-23

In order to be able to decipher a secret message through cryptanalysis, you need to have a sufficient quantity of data to evaluate whether it has been done properly. If all a cryptoanalyst has to work with is enciphered text (say, in the form of an intercepted message) the attempt to decipher it is called a ciphertext-only attack. For a variety of reasons, these are very tricky things to accomplish. The element described below is one of the most basic.

In order to understand why a message of sufficient length is important, consider a message that consists only of a single enciphered phone number: “724-826-5363.” These numbers could have been modified in any of a great number of ways: for instance, adding or subtracting a certain amount from each digit (or alternating between adding and subtracting). Without knowing more, or being willing to test lots of candidate phone numbers, we have no way of learning whether we have deciphered the message properly. On the basis of the ciphertext alone, 835-937-6474 is just as plausible as 502-604-3141.

Obviously, this is only a significant problem for short messages. One could imagine ways in which BHJG could mean ‘HIDE’ or ‘TREE’ or ‘TRAP.’ The use of different keys with the same algorithm could generate any four letter word from that ciphertext. Once we have a long enough enciphered message, however, it becomes a lot more obvious when we have deciphered it properly. If I know that the ciphertext:

UUEBJQPWZAYIVMNAZSUQPYJVOMDGZIQHWZCX

has been produced using the Vigenere cipher, and I find that it deciphers to:

IAMTHEVERYMODELOFAMODERNMAJORGENERAL

when I use the keyword MUSIC, it is highly likely that I have found both the key and the unenciphered text.

This concept is formalized in the idea of unicity distance: invented by Claude Shannon in the 1940s. Unicity distance describes the amount of ciphertext that we must have in order to be confident that we have found the right plaintext. This is a function of two things: the entropy of the plaintext message (something written in proper English is far less random than a phone number) and the length of the key being used for encryption.

To calculate the unicity distance for a mesage written in English, divide the length of the key in bits (say, 128 bits) by 6.8 (which is a measure of the level of redundancy in English). With about eighteen characters of ciphertext, we can be confident that we have found the correct message and not simply one of a number of possibilities, as in the phone number example. By definition, compressed files have redundancy removed; as such, you may want to divide the key length by about 2.5 to get their unicity distance. For truly random data, the level of redundancy is zero therefore the unicity distance is infinite. If I encipher a random number and send it to you, a person who intercepts it will never be able to determine – on the basis of the ciphertext alone – whether they have deciphered it properly.

For many types of data files, the unicity distance is comparable to that in normal English text. This holds for word processor files, spreadsheets, and many databases. Actually, many types of computer files have significantly smaller unicity distances because they have standardized beginnings. If I know that a file sent each morning begins with: “The following the the weather report for…” I can determine very quickly if I have deciphered it correctly.

Actually, the last example is particularly noteworthy. When cryptoanalysts are presented with a piece of ciphertext using a known cipher (say Enigma) and which is known to include a particular string of text (such as the weather report introduction), it can become enormously easier to determine the encryption key being used. These bits of probable texts are called ‘cribs‘ and they played an important role in Allied codebreaking efforts during the Second World War. The use of the German word ‘wetter’ at the same point in messages sent at the same time each day was quite useful for determining what that day’s key was.

Litty October 23, 2007 at 12:17 pm

Mark October 23, 2007 at 12:37 pm

You learn a new thing every day!

Anon October 23, 2007 at 1:39 pm

Here is a city whose private life takes place upstairs, its ground floors largely abandoned to the waves – especially after the terrible flood of November, 1966, when 194 centimetres of water destroyed much of Venice’s ground-level grandeur and launched a project that, today, every maritime city in the world is watching. Venice’s flood season, after all, could soon be Mumbai’s, Miami’s, Halifax’s flood season. Many scientists believe global warming and melting ice caps will cause ocean levels to rise 10 to 30 cm. So when we visit Venice today, we are visiting our homes tomorrow.

Anon @ Wadh October 23, 2007 at 11:30 pm

This information cannot possibly be useful to anyone who reads this…

Milan December 5, 2007 at 11:50 am

Random facts:

My office phone number is the same as the 124,573,291th through 124,573,300th digits of Pi.

My birthday starts at digit 196,469,286.

Anon December 16, 2007 at 6:36 pm

Basic result from group theory:

If p is a prime, then for integers a such that 0 < a < p, then ap – 1modp = 1.

This is almost never true when p is composite.

Anon December 16, 2007 at 6:37 pm

Rather:

If p is a prime, then for integers a such that 0 < a < p, then a^(p – 1)modp = 1.

Anon December 16, 2007 at 6:38 pm

This comment field doesn’t allow <sup></sup> tags

Milan March 21, 2008 at 9:20 pm