Unicity distance

Sky, moon, and wires

In order to be able to decipher a secret message through cryptanalysis, you need to have a sufficient quantity of data to evaluate whether it has been done properly. If all a cryptoanalyst has to work with is enciphered text (say, in the form of an intercepted message) the attempt to decipher it is called a ciphertext-only attack. For a variety of reasons, these are very tricky things to accomplish. The element described below is one of the most basic.

In order to understand why a message of sufficient length is important, consider a message that consists only of a single enciphered phone number: “724-826-5363.” These numbers could have been modified in any of a great number of ways: for instance, adding or subtracting a certain amount from each digit (or alternating between adding and subtracting). Without knowing more, or being willing to test lots of candidate phone numbers, we have no way of learning whether we have deciphered the message properly. On the basis of the ciphertext alone, 835-937-6474 is just as plausible as 502-604-3141.

Obviously, this is only a significant problem for short messages. One could imagine ways in which BHJG could mean ‘HIDE’ or ‘TREE’ or ‘TRAP.’ The use of different keys with the same algorithm could generate any four letter word from that ciphertext. Once we have a long enough enciphered message, however, it becomes a lot more obvious when we have deciphered it properly. If I know that the ciphertext:

UUEBJQPWZAYIVMNAZSUQPYJVOMDGZIQHWZCX

has been produced using the Vigenere cipher, and I find that it deciphers to:

IAMTHEVERYMODELOFAMODERNMAJORGENERAL

when I use the keyword MUSIC, it is highly likely that I have found both the key and the unenciphered text.

This concept is formalized in the idea of unicity distance: invented by Claude Shannon in the 1940s. Unicity distance describes the amount of ciphertext that we must have in order to be confident that we have found the right plaintext. This is a function of two things: the entropy of the plaintext message (something written in proper English is far less random than a phone number) and the length of the key being used for encryption.

To calculate the unicity distance for a mesage written in English, divide the length of the key in bits (say, 128 bits) by 6.8 (which is a measure of the level of redundancy in English). With about eighteen characters of ciphertext, we can be confident that we have found the correct message and not simply one of a number of possibilities, as in the phone number example. By definition, compressed files have redundancy removed; as such, you may want to divide the key length by about 2.5 to get their unicity distance. For truly random data, the level of redundancy is zero therefore the unicity distance is infinite. If I encipher a random number and send it to you, a person who intercepts it will never be able to determine – on the basis of the ciphertext alone – whether they have deciphered it properly.

For many types of data files, the unicity distance is comparable to that in normal English text. This holds for word processor files, spreadsheets, and many databases. Actually, many types of computer files have significantly smaller unicity distances because they have standardized beginnings. If I know that a file sent each morning begins with: “The following the the weather report for…” I can determine very quickly if I have deciphered it correctly.

Actually, the last example is particularly noteworthy. When cryptoanalysts are presented with a piece of ciphertext using a known cipher (say Enigma) and which is known to include a particular string of text (such as the weather report introduction), it can become enormously easier to determine the encryption key being used. These bits of probable texts are called ‘cribs‘ and they played an important role in Allied codebreaking efforts during the Second World War. The use of the German word ‘wetter’ at the same point in messages sent at the same time each day was quite useful for determining what that day’s key was.

Secrets and Lies

Ottawa church

Computer security is an arcane and difficult subject, constantly shifting in response to societal and technological forcings. A layperson hoping to get a better grip on the fundamental issues involved can scarcely do better than to read Bruce Schneier‘s Secrets and Lies: Digital Security in a Networked World. The book is at the middle of the spectrum of his work, with Beyond Fear existing at one end of the spectrum as a general primer on all security related matters and Applied Cryptography providing far more detail than non-experts will ever wish to absorb.

Secrets and Lies takes a systematic approach, describing types of attacks and adversaries, stressing how security is a process rather than a product, and explaining a great many offensive and defences strategies in accessible ways and with telling examples. He stresses the impossibility of preventing all attacks, and hence the importance of maintaining detection and response capabilities. He also demonstrates strong awareness of how security products and procedures interact with the psychology of system designers, attackers, and ordinary users. Most surprisingly, the book is consistently engaging and even entertaining. You would not expect a book on computer security to be so lively.

One critical argument Schneier makes is that the overall security of computing can only increase substantially if vendors become liable for security flaws in their products. When a bridge collapses, the construction and engineering firms end up in court. When a ten year old bug in Windows NT causes millions of dollars in losses for a company losing it, Microsoft may see fit to finally issue a patch. Using regulation to structure incentives to shape behaviour is an approach that works in a huge number of areas. Schneier shows how it can be made to work in computer security.

Average users probably won’t want to read this book – though elements of it would probably entertain and surprise them. Those with an interest in security, whether it is principally in relation to computers or not, should read it mostly because of the quality of Schneier’s though processes and analysis. The bits about technology are quite secondary and pretty easily skimmed. Most people don’t need to know precisely how smart cards or the Windows NT kernel are vulnerable; they need to know what those vulnerabilities mean in the context of how those technologies are used. Reading this book will leave you wiser in relation to an area of ever-growing importance. Those with no special interest in computers are still strongly encouraged to read Beyond Fear: especially if they are legislators working on anti-terrorism laws.

Materials science and transgenic animals

Oil spill analysis equipment

One of the most interesting ongoing developments in materials science involves the borrowing of biologically originated materials and processes. The development is old news for people who follow science news, but seems worth mentioning to others.

In the first instance, there is the copying of chemical tricks that exist in nature. People have speculated about copying the wall sticking abilities of gecko feet, for instance. By artificial producing structures similar to those on the feet, excellent non-chemical adhesives could be made. Gecko feet are sufficiently adhesive to hold several hundred times the weight of the animal. Furthermore, they can be attached and detached at will by altering the geometry of the setae that produce the adhesion using Van der Waals force.

In the second instance, people have been exploiting biological processes to produce existing things in more effective ways. A favourite way to do this is through pharming: where new genes are introduced into species in order to turn them into pharmaceutical factories. For instance, goats have been genetically engineered to produce an anti-clotting drug in their milk, which can then be extracted, purified, and used by humans. The drug, called ATryn, treats hereditary antithrombin deficiency: a condition that makes people especially vulnerable to deep-vein thrombosis. The principle benefits of using goats are financial, as described in The Economist:

Female goats are ideal transgenic “biofactories”, GTC claims, because they are cheap, easy to look after and can produce as much as a kilogram of human protein per year. All told, Dr Cox reckons the barn, feed, milking station and other investments required to make proteins using transgenic goats cost less than $10m—around 5% of the cost of a conventional protein-making facility. GTC estimates that it may be able to produce drugs for as little as $1-2 per gram, compared with around $150 using conventional methods.

Transgenic goats are also being used to produce spider silk on an industrial scale. That super-strong material could be used in everything from aircraft to bullet-proof vests. Different varieties of spider silk could be used to produce materials with varying strengths and elasticities.

While the former behaviour seems fairly unproblematic (we have been coping from nature for eons), the latter does raise some ethical issues. Certainly, it involves treating animals as a means to greater ends – though that is also an ancient activity. People have generally been more concerned about the dangers to people and the natural world from such techniques: will the drugs or materials produced be safe? Will the transgenic animals escape and breed with wild populations? These are reasonable concerns that extend well beyond the genetic or materials expertise possessed by the scientists in question.

The potential of such techniques is undeniably considerable. One can simply hope that a combination of regulation and good judgment will avoid nightmare situations of the kind described in Oryx and Crake. So far, our genetically modified creatures tend to be inferior to their natural competitors. According to Alan Weisman, virtually all of our crops and livestock would be eliminated by predation and competition in a few years, in the absence of human care and protection. It remains to be seen whether the same will be true of plants and animals that currently exist only in the imaginations of geneticists.

On technology and vulnerability

The first episode of James Burke’s Connections is very thought provoking. It demonstrates the inescapable downside of Adam Smith‘s pin factory: while an assembly line can produce far more pins than individual artisans, each of the assembly line workers becomes unable to produce anything without the industrial network that supports their work.

See this prior entry on Burke’s series

Protecting sources and methods

Rusty metal wall

By now, most people will have read about the Canadian pedophile from Maple Ridge who is being sought in Thailand. The story is a shocking and lamentable one, but I want to concentrate here on the technical aspect. INTERPOL released images of the man, claiming they had undone the Photoshop ‘twirl’ effect that had been used to disguise him initially in compromising photos. While this claim has been widely reported in the media, there is at least some reason to question it. It is also possible that INTERPOL is concealing the fact that it received unaltered photos from another source, which could have been anything from intercepted emails to files recovered from an improperly erased camera memory card. It could even have been recovered from the EXIF metadata thumbnails many cameras produce. It is also possible this particular effect is so easy to reverse (and that the technique is so widely known to exist) that INTERPOL saw no value in keeping their methods secret. A quick Google search suggests that the ‘twist’ effect is a plausible candidate for easy reversal.

Providing an alternative story to explain the source of information is an ancient intelligence tactic. For instance, during the Second World War an imaginary spy ring was created by the British and used to justify how they had some of the information that had actually been obtained through cracked ENIGMA transmissions at Bletchley Park. Some have argued that the Coventry Bombing was known about in advance by British intelligence due to deciphered messages, but they decided not to evacuate the city because they did not want to reveal to the enemy that their ciphers had been compromised. While this particular example may or may not be historically accurate, it illustrates the dilemma of somebody in possession of important intelligence acquired in a sensitive manner.

Cover stories can conceal sources and methods in other ways. A few years ago, it was claimed that Pervez Musharraf had escaped having his motorcade bombed, due to a radio jammer. While that is certainly possible, it seems unlikely that his guards would have reported the existence of the system if it had played such a crucial role. More likely, they got tipped off from an informant in the group responsible, an agent they had implanted in it, or some sort of communication intercept. Given how it is now widely known that email messages and phone calls worldwide are regularly intercepted by governments, I imagine a lot of spies and informants are being protected by false stories about communication intercepts.

In short, it is fair to say that any organization concerned with intelligence gathering will work diligently to protect their sources and methods. After all, these are what ensure their future access to privileged information in the future. While there is a slim chance INTERPOL intentionally revealed their ability to unscramble photographs as some sort of deterrent, it seems unlikely. This situation will simply encourage people to use more aggressive techniques to conceal their faces in the future. It is also possible that, in this case, they felt that getting the man’s image out was more important than protecting their methods. In my opinion, it seems most likely that ‘twist’ really is easy to unscramble and that they saw little value in not publicizing this fact. That said, it remains possible that a more complex collection of tactics and calculations has been applied.

Mac security tips

Gatineau Park, Quebec

During the past twelve months, 23.47% of visits to this blog have been from Mac users. Since there are so many of them out there, I though I would share a few tips on Mac security. Out of the box, OS X does beat Windows XP on security – partly for design reasons and partly because it isn’t as worthwhile to come up with malware that attacks an operating system with a minority of users. Even so, taking some basic precautions is worthwhile. The number one tip is behavioural, rather than technical. Be cautious in the websites and emails you view, the files you download, and the software you install.

Here are more detailed guides from a company called Corsair (which I know nothing about) and from the American National Security Agency (who knew they used Macs?). The first link is specific to Tiger (10.4), while the latter is about the older Panther (10.3). I expect they will both remain largely valid for the upcoming Leopard (10.5).

Some more general advice I wrote earlier: Protecting your computer.

PS. I am curious about the one person in the last orbit who accessed this site using OS/2 Warp, back on February 17th. I hope it was one of the nuns from the ads.

Fixed-wing / helicopter hybrids

A good number of readers probably know something about the V-22 Osprey tilt-rotor aircraft. They may recall the ad that Bell Helicopter Textron ran in The National Journal which explained that the aircraft “descends from the heavens” but “unleashes hell.” This would probably have attracted less controversy if it hadn’t shown American troops rappelling onto the roof of a mosque.

Many people argue that the V-22 is unsafe. Fewer people realize that it was a second attempt at this sort of vehicle. A predecessor called the DP-2 was even less successful.

What is it that makes Vertical Take-Off and Landing so difficult?

Once more on the importance of backups

As mentioned before, the best defence against data loss from viruses or hardware damage is to make comprehensive, frequent backups. As such, I propose the following rule of thumb:

If a piece of data is worth more than the drive space it occupies, a second copy should exist somewhere else.

Nowadays, you can easily pick up hard drives for less than $1 per gigabyte. At those prices, it probably isn’t just personal photos and messages that are worth saving, but any bulk data (movies, songs, etc) that would take more than $1 per gigabyte in effort to find and download again.

Mac users should consider downloading Carbon Copy Cloner. It produced bootable byte-for-byte copies of entire drives. That means that even if the hard drive in your computer dies completely and irreplaceably, you can actually run your system off an external hard drive, with all the data and functionality it possessed when you made the most recent copy.

One nice perk about having one or more such copies is how they can let you undo mistakes. If you accidentally erased or corrupted an important file, you can go back and grab it. Likewise, if you installed a software update that proved problematic, you can shift you entire system back to an earlier state.

[Update: 22 January 2010] Since I wrote this article, Apple released new versions of OS X with their excellent Time Machine backup software built-in. I strongly encourage all Mac users to take advantage of it.

Five gigatonne globe

∑ 5 Gt CO2e

[Update: 22 January 2009] Some of the information in the post below is inaccurate. Namely, it implies that some level of continuous emissions is compatible with climate stabilization. In fact, stabilizing climate required humanity to have zero net emissions in the long term. For more about this, see this post.

As discussed before, credible estimates of the carbon absorbing capacity of the biosphere are around five billion tonnes (five trillion kilograms) per year of carbon dioxide equivalent.

The graphic above is probably far too nerdy to have popular appeal – and it is possible the numerical figure will need revision – but it does strike me as a concise expression of what needs to be tackled.

A suggestion to Google

One cool feature of Google is that it performs unit conversions. It makes it easy to learn that 1000 rods is the same as 2750 fathoms. One useful addition would be the calculation of carbon dioxide equivalents: you could plunk in “250 tonnes of methane in CO2 equivalent” and have it generate the appropriate output, based on the methodology of the IPCC. The gasses for which the calculator should work would also include nitrous oxide, SF6, HCFCs, HFCs, CFCs, and PFCs.

Sure, this feature would only be useful for less than one person in a million, but Google has often shown itself willing to cater to the needs of techie minorities.