Statistics in cryptanalysis and paleoclimatology

Reading Wallace Broecker‘s new book on paleoclimatology, I realized that a statistical technique from cryptanalysis could be useful in that field as well. Just as the index of coincidence can be used to match up different ciphertexts partially or completely enciphred with the same key and polyalphabetic cryptosystem, the same basic statistics could be used to match up ice or sediment samples by date.

As with the cryptographic approach, you would start with the two sections randomly aligned and then alter their relative positions until you see a big jump in the correlation between them. At that point, it is more likely than not that you have aligned the two. It probably won’t work perfectly with core samples – since they get squished and stretched by geological events and churned by plants and animals – but an approach based on the same general principle could still work.

Doubtless, some clever paleoclimatologist devised such a technique long ago. Nonetheless, it demonstrates how even bits of knowledge that seem utterly unrelated can sometimes bump up against one another fortuitously.

Oil’s next century

Spiky blue flowers

With oil prices at levels rivaling those during the crises of the 1970s, virtually everyone is clamouring for predictions about medium and long-term prices. Those concerned about climate change are also very actively wondering what effect higher hydrocarbon prices will have.

In order to know what the future of oil looks like, answers are required to a number of questions:

  1. How will the supply of oil change during the decades ahead? How many new reserves will be found, where, and with what price of extraction? How much can Saudi Arabia and Russia expand production? When will their output peak?
  2. How will the demand for oil change? How much and how quickly will high prices depress demand in developed states? What about fast growing developing states like India and China?
  3. At what rate, and what cost, will oil alternatives emerge. Will anyone work out how to produce cellulosic ethanol? Will the development of oil sands and/or oil shale continue apace?
  4. What geopolitical consequences will prices have? If prices are very high, will that prove destabilizing within or between states?
  5. Will the emerging alternatives to oil be carbon intensive (oil sands, corn ethanol) or relatively green (cellulosic ethanol, biomass to liquids)?

Of course, nobody knows the answer to any of this with certainty. There are ideological optimists who assert that humanity will respond to incentives, innovate, and prosper. There are those who allege that oil production is bound to crash, and that civilization as we know it is likely to crash as well.

Mindful of the dangers of prediction, I will hold off on expressing an opinion of my own right now. The magnitude of the questions is far too great to permit solution by one limited mind. What contemplating the variables does allow is an appreciation for the vastness and importance of the issue. Virtually any combination of answers to the questions above will bring new complications to world history.

The index of coincidence

Purple irises

If you are dealing with a long polyalphabetically enciphered message with a short key, the Kasiski examination is an effective mechanism of cryptanalysis. Using repeated sections in the ciphertext, and the assumption that these are often places where the same piece of plaintext was enciphered with the same portion of the key, you can work out the length of the keyword. Then, it is just a matter of dividing the message into X collections of letters (X corresponding to the length of the keyword) and performing a frequency analysis of each. That way, you can identify the cipher alphabet used in each of the encipherments, as well as the keyletter.

If the key is long, however, it may be impossible to get enough letters per alphabet to perform a frequency analysis. Similarly, there may not be enough repetitions in the key to create the pairings Kasiski requires. Here, the clever technique of the index of coincidence may be the answer.

Consider two scenarios, one in which you have two strings of random letters and one in which you have two strings of English:

GKECOAENCYBGDWQMGGRR
VQNWSKXMJWTBKCCMRJUO

TOSTRIVETOSEEKTOFIND
SOWEBEATONBOATSAGAIN

At issue is the number of times letters will match between the top and bottom row. When the strings are random, the chance is always 1/26 or 0.0385. Because some letters in English are more common and some are less common, a match is more likely when using English text. Imagine, for instance, that 75% of the letters in a normal English sentence were ‘E.’ Any two pieces of English text would get a lot of ‘E’ matches. Even if enciphered so that ‘E’ is represented by something else, the number of matches would remain higher than a random sample.

Since polyalphabetic ciphers involve enciphering each letter in a plaintext using a different ciphertext alphabet, an ‘E’ in one part of a ciphertext need not represent an ‘E’ somewhere else. That being said, as long as you line up two ciphertext messages so the letter on top and the letter underneath are using the same alphabet, you will get the same pattern of better-than-random matches for Englist text. Imagine, to begin with, a message enciphered using five different alphabets (1,2,3,4 and 5). Two messages using the same alphabets and key (say, 54321) could be ligned up either in a matching way or in an offset way:

543215432154321
543215432154321

543215432154321
321543215432154

Note that these strings describe the alphabet being used to encipher each plaintext letter, not the letter itself. In the second case, the probability of a match should be essentially random (one property of polyalphabetic ciphers is that they flatten out the distribution of letters from the underlying plaintext). In the second case, we would get the same matching probability as with unenciphered English (0.0667). We can thus take any two messages enciphered with the same key and try shifting them against each other, one letter at a time. When the proportion of matches jumps from about 0.0385 to about 0.0667, we can conclude that the two have been properly matched up. This is true regardless of the length of the key, and can be used with messages that are not of the same length.

This doesn’t actually solve the messages for us, but it goes a long way toward that end. The more messages we can collect and properly align, the more plausible it becomes to crack the entire collection and recover the key. This method was devised by William F. Friedman, possibly America’s greatest cryptographer, and is notable because anybody sufficiently clever could have invented it back when polyalphabetics were first used (16th century or earlier). With computers to do the shifting and statistics for us, the application of the index of coincidence is a powerful method for use against polyalphabetic substitution ciphers, including one time pads where the operators have carelessly recycled sections of the key.

Rommel and cryptography

One of the most interesting historical sections so far in David Khan’s The Code-Breakers describes the campaign in North Africa during WWII. Because of a spy working in the US embassy in Rome, the American BLACK code and its accompanying superencipherment tables were stolen. This had a number of major tactical impacts, because it allowed Rommel to read the detailed dispatches being sent back by the American military attache in Cairo.

Khan argues that this intelligence played a key role in Rommel’s critical search for fuel. His supply line across the Mediterranean was threatened by the British presence in Malta. Knowledge about a major resupply effort allowed him to thwart commando attacks against his own aircraft and turn back two major resupply convoys. It also provided vital information on Allied defences during his push towards Suez.

The loss of Rommel’s experienced cryptographers due to an accidental encounter with British forces had similarly huge consequences. It cut off the flow of intelligence, both because of changed codes and loss of personnel. As a result, the Allied assault at Alamein proved to be a surprise for Rommel and an important turning point.

As with so many examples in warfare, this demonstrates the huge role of chance in determining outcomes. Had security been better at the embassy in Rome, Rommel might have been stopped sooner. Had the German tactical intelligence team not been intercepted, Rommel might have had detailed warnings about Alamein. The example also shows how critical intelligence and cryptography can be, in the unfolding of world affairs.

Border guards and copyright enforcement

According to Boing Boing, Canadian border guards may soon be in charge of checking iPods and other devices for copyright infringement. If true, the plan is absurd for several reasons. For one, it would be impossible for them to determine whether a DRM-free song on your iPod was legitimately ripped from a CD you own or downloaded from the web. For another, this is a serious misuse of their time. It would be a distraction from decidedly more important tasks, like looking for illegal weapons, and probably a significant irritant to both those being scrutinized and those waiting at border crossings.

Hopefully these rumours of secret plans – also picked up by the Vancouver Sun are simply false.

Multiple anagramming

Emily Horn in a heap of clothes

The process of cryptanalysis can be greatly simplified if one possesses more than one message encrypted with the same key. One especially important technique is multiple anagramming. Indeed, it may be the only way to decipher two or more messages that have been enciphered using a one time pad.

The basic idea of multiple anagramming is that you can use one message to guess what possible keys might be, then use another message to check whether it might be correct. For instance, imagine we have these two messages and think they were enciphered using the Vigenere cipher:

SGEBVYAUZUYKRQLBCGKEFONNKNSMFRHULSQ
TUEEDAKHNVKUEOICHKIEPOHRIFDQSPHGEGQ

Now, suppose we think the first message might be addressed to Derek, Sarah, or Steve.

Using words we think the message might start with, we can guess at a key. If the first word is DEREK, the key would start with ‘PCNXL’. If the first word is SARAH, the key would start with ‘AGNBO’. Finally, if the first word is STEVE, the key would start with ‘ANAGR’. Here, the key is a bit of a clue. Normally, there would be no easy way to tell from one message whether we had found the correct key or not.

We can then test those keys against the second message. The first key yields ‘ESRHS’ for the first five letters. The second, ‘TORDP’. The third yields ‘THEYM’. The third looks the most promising. Through either guessing or testing further letters, we can discover that the key is ‘ANAGRAM’. The second message is thus ‘THEYMAYHAVEDECIPHEREDOURCODESCHANGE.’ Having two ciphertexts that produce sensible plaintexts from the same key suggests that we have properly identified the cipher and key being used. We can then easily decipher any other messages based on the same combination.

Destroying Iraqi RADAR in 1991

Smoker and fire escape

Anyone who has been trawling the internet in a search for information on the suppression of air defences during the first Gulf War might be well served by this article. In particular, it goes into a lot of detail about the location, identification, targeting, and destruction of Iraqi RADAR installations using weapons like the American AGM-88 High-speed Anti-Radiation Missile (HARM) and the British ALARM (Air Launched Anti-Radiation Missile). The article highlights how the use of Soviet equipment by Iraq made this a kind of test situation for NATO versus Warsaw Pact air defence and attack equipment.

What this suggests is that the NATO-Warpac central European air battle would have probably followed a similar course, leading to the defeat of the Communists’ IADS within a week or so, in turn leading to air superiority in the following week, as the Communist air forces would have withered under the fire of the Allied counter-air campaign. Fortunately this never had to happen and the world has been spared the inevitable nuclear response to the lost air battle and hence total conventional defeat through attrition by air.

Not a very comforting conclusion for the world at large, though no doubt gratifying for all the companies that built American planes and missiles and things.

One interesting tactic was the use of Brunswick Tactical Air Launched Decoys. These simulated the appearance of incoming aircraft, causing Iraqi RADAR installations to ‘light up’ in order to target them. Sometimes, they would draw fire from surface-to-air missile batteries. Often, this would leave the former temporarily defenceless at a time when their position – and that of their supporting RADAR – had been revealed. Both could then be targeted by NATO aircraft. The ruse was apparently so effective that the Iraqi armed forces maintained the false belief that they had destroyed several hundred British and American planes.

There is also a fair bit of information about jamming and other forms of electronic countermeasures. All in all, it provides an interesting glimpse back into a period when conventional warfare against standing armies was something NATO still did.

Keeping the bombs in their silos

Window and siding

Back in 2005, former US Secretary of Defense Robert McNamara wrote an article in Foreign Policy about the danger of the accidental or unauthorized use of nuclear weapons. The issue remains an important one: particularly given trends like Russia’s increasingly assertive behaviour (putting more nuclear weapons out where accidents or miscalculations could occur), as well as ongoing nuclear proliferation.

Writing for Slate, Ron Rosenbaum has written an article on steps the next US President could take to reduce ‘inadvertence.’ The danger of nuclear war may seem like a dated Cold War concern, but the sheer number of weapons on fifteen minute alert, the pressure on leaders to make an immediate decision when the military thinks an attack is taking place, and the growing number of states with nuclear technology all mean that it should remain a contemporary concern and area for corrective action.

Improvised explosive devices

Trash in the Rideau Canal locks

The Washington Post has an interesting special feature on improvised explosive devices (IEDs) in Iraq and Afghanistan. While the overall themes are quite common – Western forces are much less effective against insurgents than armies, low cost and low tech weapons can neutralize huge advantages in funds and technology – the specific details provided are quite interesting.

IEDs are apparently the single biggest killer of coalition troops in Afghanistan and Iraq. Partly, that is the result of not having large enough forces to monitor important routes continuously. Partly, it is the product of the sheer volume of explosives available in both states. Partly, it is the result of assistance provided by other states or sub-state groups, such as Iranian assistance being provided to some Shiite groups. Explosively formed penetrators – capable of firing six or seven pounds of copper at 2000 metres per second – are an excellent example of a relatively low cost, low-tech technology that seriously threatens a force that is far better trained, supported, and equipped overall.

Seeing how total air superiority, expensive armoured vehicles, and sophisticated electronic countermeasures can be no match for some guys with rusty old artillery shells and some wire is a humbling reminder of the limited utility of military force. Ingenuity, practicality, and humility will probably prove to be essential qualities as the US tries to find the least bad path out of Iraq, and while NATO tries to salvage the situation in Afghanistan.

Privacy and Facebook applications

I have mentioned Facebook and the expectation of privacy before. Now, the blog of the Canadian privacy commissioner is highlighting one of the risks. Because third party applications have access to both the data of those who install them and the friends of those who have them installed, they can be used to surreptitiously collect information from those in the latter group. While this widens the scope of what third party applications can do, it also seriously undermines the much-trumpeted new privacy features in the Facebook platform.

It just goes to reinforce what I said before: you should expect that anything you post on Facebook is (a) accessible to anyone who wants to see it and (b) likely to remain available online indefinitely. The same goes for most information that is published somewhere online, including on servers you operate yourself.