Some useful patterns in English

2008-05-02

in Geek stuff, Security

Rusty connector

By about 1300 CE, Arabic cryptographers had determined that you can decipher messages in which one letter has been replaced by another letter, number, or symbol by exploiting statistical characteristics of the underlying language. Here are some especially useful patterns in English.

  1. E is by far the most common letter – representing about 1/8th of normal text.
  2. If you list the alphabet from most to least commonly used, it divides into four groups.
  3. The highest frequency group includes: e, t, a, o, n, i, r, s, and h.
  4. The middle frequency group includes: d, l, u, c, and m.
  5. Less common are p, f, y, w, g, b, and v.
  6. The lowest frequency group includes: j, k, q, x, and z.
  7. E associates most widely with other letters: appearing before or after virtually all of them, in different circumstances.
  8. Among combinations of a, i, and o io is the most common combination. Ia is the second most common. Ae is rarest.
  9. 80% of the time, n is preceded by a vowel.
  10. 90% of the time, h appears before vowels.
  11. R tends to appear with vowels; s tends to appear with consonants.
  12. The most common repeated letters are ss, ee, tt, ff, ll, mm and oo.

Naturally, there are thousands more such patterns. Even understanding a few can help in deciphering messages that have had a basic substitution cipher applied.

Here’s one to try out:

LKCLHQBCKDRCPQQBDKAPZULSQUCDK
AZRDTDGPCOTZKQDPQBZQDQZHHLOIP
XLSVDQBZAOCZQICZGLHQDJCQLOCZI
QBDKAPQBZQDKQCOCPQXLSDKXLSOPM
ZOCQDJCSKHLOQSKZQCGXLQQZVZDPO
CGZQDTCGXMLLOGXMOLTDICIHLOVDQ
BRZHCPQLLJZKXLHQBCJRGLPCNSDQC
CZOGXDKQBCCTCKDKA

One hint is that cipher alphabets are not always entirely random. The tools on this page are useful for cracking monoalphabetic substitution ciphers.

Report a typo or inaccuracy

{ 9 comments… read them below or add one }

Milan May 2, 2008 at 10:21 am

Here is a much easier to solve ciphetext from Simon Singh’s website.

What a difference spaces make…

Anonymous May 2, 2008 at 4:10 pm
R.K. May 3, 2008 at 3:08 pm

In your puzzle, I am betting C is ‘e’ and Q is ‘t.’

R.K. May 3, 2008 at 3:09 pm

If Q is ‘t’, QQZVZ might be ‘Ottawa.’

R.K. May 3, 2008 at 3:09 pm

LQQZVZ, rather

R.K. May 3, 2008 at 3:11 pm

oKeoHtBeKDRePttBDKAPaUoStUeDK
AaRDTDGPeOTaKtDPtBatDtaHHoOIP
XoSwDtBaAOeatIeaGoHtDJetoOeaI
tBDKAPtBatDKteOePtXoSDKXoSOPM
aOetDJeSKHoOtSKateGXottawaDPO
eGatDTeGXMooOGXMOoTDIeIHoOwDt
BRaHePtooJaKXoHtBeJRGoPeNSDte
eaOGXDKtBeeTeKDKA

R.K. May 3, 2008 at 3:16 pm

‘tBe’ occurs three times, so B may be ‘h.’

oKeoHtheKDRePtthDKAPaUoStUeDK
AaRDTDGPeOTaKtDPthatDtaHHoOIP
XoSwDthaAOeatIeaGoHtDJetoOeaI
thDKAPthatDKteOePtXoSDKXoSOPM
aOetDJeSKHoOtSKateGXottawaDPO
eGatDTeGXMooOGXMOoTDIeIHoOwDt
hRaHePtooJaKXoHtheJRGoPeNSDte
eaOGXDKtheeTeKDKA

R.K. May 3, 2008 at 3:18 pm

abcdefghijklmnopqrstuvwxyz
Z—C–B——L—-Q–V—

Anon May 7, 2008 at 10:09 am

I am betting the cipher alphabet is in the form:

abcdefghijklmnopqrstuvwxyz
WORDABCEFGHIJKLMNPQSTUVWYZ

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Previous post:

Next post: