Strengthening substitution ciphers


in Geek stuff, Internet matters, Security

Fountain in Gatineau

The biggest problem with substitution ciphers (those that replace each letter with a particular other letter or symbol) is that they are vulnerable to frequency analysis. In any language, some letters are more common than others. By matching up the most common symbols with what you know the most common letters are, you can begin deciphering the message. Likewise, you can use rules like ‘a rare letter than almost always appears to the left of one specific more common letter is probably a Q.’ What is needed to strengthen such ciphers is a language in which words have no such ‘personality.’ Here is how to do it:

First, take all the short words (less than three letters) and assign them a random three digit code. Lengthening very short words further strengthens this approach because short words are the most vulnerable to frequency analysis; a single letter sitting with spaces on either side is probably ‘a’ or ‘i.’ Using three digit groups and 26 letters, you can assign 17,576 words. Now, take as many words from the whole language as you want to be able to use. For the sake of completeness, let’s use the entire Oxford English Dictionary. The 456,976 possible four letter groups more than suffice to cover every word in it, leaving some space for technical terms that we may want to encrypt but which might not be included. If we need even more possibilities, there are 11,881,376 five letter combinations.

This approach is cryptographically valuable for a number of reasons. Since the codes representing words have a random collection of letters, the letter frequency in a ‘translated’ message is also random. You no longer need to worry that some English letters are more common than others. Just as important, there are none of the ‘Q’ type rules by which to later attack the substitution cipher. The dictionary of equivalencies would not need to be secret; indeed, it should be widely available. Having the dictionary does not make encrypted messages more vulnerable, since they will have passed through a substitution cipher before being distributed and are fundamentally more robust to the cryptoanalysis of substitution ciphers than a message enciphered from standard English would be.

In the era of modern algorithms like AES, I doubt there is any need for the above system. Still, I wonder if there are any historical examples of this approach being used. If you have a computer to do the code-for-word and word-for-code substitutions, it would be quite a low effort mechanism to increase security.

Report a typo or inaccuracy

{ 2 comments… read them below or add one }

ENigma August 9, 2007 at 2:27 pm

Ab znggre ubj nqinaprq lbhe zrgubqf, lbh jvyy arire cebqhpr na rapelcgvba flfgrz zber cbjreshy guna guvf bar!

Abgvpr ubj phefbel rknzvangvba qbrf abg vzzrqvngryl vasbez gur ernqre bs gur qrpelcgvba zrgubq. Cerggl pyrire, ab?

R.K. August 9, 2007 at 4:34 pm

ROT13? Pah!

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Previous post:

Next post: