Insight into Google

Tomatoes on a vine

For someone who produces a site which covers a broad variety of topics, Google is an especially critical source of traffic (because people interested in one topic are unlikely to follow a site with a bunch of other random topics included). In my case, more than 60% of the traffic I received in the last year came as the result of Google searches. No other search engine produces more than 3.5%, and only 12% of visitors actually type in the URL, rather than clicking a link from a page of search results or another site.

Given the importance of Google, it is worth knowing a bit about how the organization operates. Over at All Things Digital, there are three interesting articles. The first covers the human evaluators Google uses to evaluate the effectiveness of their various search algoriths. The second discusses the attempts people make to game the system (inevitable, given the sheer amount of money that can be gained or lost by rising or falling in Google rankings). The third describes how Google intends to improve future search results.

One interesting fact mentioned in the first piece is that the option Google offers for users to hide results in their searches is used to refine their search algorithms. For instance, I am personally annoyed by websites that try to scrape together an identity page on someone, by grabbing snippets from here and there that seem related to them. Sites that do this include pipl.com, 123people.co.uk, zoominfo.com, and others. It is a bit encouraging that if enough people hide their unsolicited and error-prone amalgamations, their overall page rankings may eventually suffer.

Diet for nerds and computer programmers

Aero Ace biplane

John Walker, the founder of Autodesk, has written an interesting guide on health and weight loss, which is available for free online: The Hacker’s Diet.

Basically, the book focuses on the fundamental mathematical issues associated with weight loss and gain, and describes some useful techniques for transitioning to a lower weight. In particular, the moving average approach to measurement described seems quite valuable, insofar as it helps to separate the ‘signal’ of actual weight from the ‘noise’ of variation in things like water retention. The moving average generates a trend line that seems like it should provide more meaningful guidance than a scatterplot of individual data points, or even a simple curve fit to them.

The book also describes a 15-minute health regimen that ramps up in difficulty and is intended to serve as a minimum level of exercise for life.

The book is quite an unusual one, as health books go. For instance, it endorses frozen microwave dinners as a convenient way to get a predetermined number of calories. It also insists that exercise is not a critical weight loss strategy, and that some degree of suffering inevitably accompanies efforts to move closer to one’s ideal healthy weight. While I am sure people could take exception to this approach, it is good to have variety out there, and encouraging that tools are being created for the ever-larger number of people worldwide that are overweight or obese, and likely to suffer significant health risks as a consequence. Those who don’t want to mess around with Walker’s custom Excel files can use a web-based version of Walker’s approach at PhysicsDiet.com

Sustainable Energy – Without the Hot Air

David MacKay’s Sustainable Energy – Without the Hot Air is a remarkably engaging book; it has certainly kicked off and contributed to some very energetic discussions here. The book, which was written by a physics professor at Cambridge and is available for free online, is essentially a detailed numerical consideration of renewable forms of power generation, as well as technologies to support it, and to reduce total power demand. MacKay concludes that the effort required to produce sustainable energy systems is enormous, and that one of the most viable options is to build huge solar facilities in the world’s deserts, and use that to provide an acceptable amount of energy to everyone.

The book has a physics and engineering perspective, rather than one focused on politics or business. MacKay considers the limits of what is physically possible, given the character of the world and the physical laws that govern it. Given that he does not take economics into consideration much, his conclusions demonstrate the high water mark of what is possible, with unlimited funds. In the real world, renewable deployment will be even more challenging than it is in his physics-only model.

Here are some of the posts in which the book has already been discussed:

I have added relevant information from the book to the comment sections of a great many other posts, on everything from wind power to biofuels.

Even if you don’t agree with MacKay’s analysis, reading his book will provide some useful figures, graphs, and equations, as well as prompt a lot of thought. It is certainly one of the books that I would recommend most forcefully to policy makers, analysts, politicians, and those interested in deepening their understanding of what a sustainable energy future would involve.

Improving voicemail

While useful, voicemail is a flawed technology that can be improved in many ways. Three recent examples come to mind:

First, there is Apple’s visual voicemail. The improvement here is like the improvement between cassette tapes and compact discs: each message is an independent ‘track’ that can be treated as a unit. That is nicer than just having a single audio string to deal with, since you can see right away who called and jump to any message.

Secondly, there is the voicemail system of my VoIP provider. The nicest thing they do is provide an option to email you MP3s of your messages, which include caller ID to let you know who they are from. Now, I only call the actual voicemail number to periodically delete all the messages accumulating there.

Third, and neatest of all, is the transcription feature in Google’s forthcoming ‘voice’ product. Not only do you get to see who called, but you get an automated transcript. I am sure the voice recognition is far from perfect, but people seem to find it good enough to evaluate which messages need to be listened to, and which ones are just ‘call me back’ requests. To some extent, this even makes voicemail searchable, which is a neat trick.

While sound has character and authenticity to it, it is really a degraded form of communication, when it comes to simple searching and management. It is nice to see innovative ways to overcome the limitations of sound-based messages, while still retaining the original format, for those situations where you actually want to hear the message.

Pondering smartphones

Sasha Ilnyckyj in a cemetery

Soon, I will probably be switching cell phone plans, and possibly phones and providers as well. I am considering getting an internet-enabled phone, and pondering the various associated options. The most appealing phones are the iPhone and the HTC Android phone, followed by the Nokia smartphones. Using the first two would mean switching to Rogers.

In terms of the phone itself, I definitely prefer a physical keyboard to Apple’s error-prone on-screen version. That said, it would be nice to have a phone that was also an iTunes compatible iPod replacement… Does anybody have an HTC Dream or direct experience with a working one? I am curious how they compare with the iPhone for web browsing, email, and instant messaging.

I definitely don’t want to get locked into a three-year contract, so I am considering buying an unlocked phone as inexpensively as possible, then getting a one-year smartphone contract from Rogers. That way, if I move outside Canada, or get into a financial circumstance incompatible with expensive data plans, I won’t have to pay a massive fee to get out of the contract.

Contributing to Project Honeypot

Spammers are one of the most annoying natural enemies of the blogging community. They waste the time of site administrators who must install anti-spam systems and dig through suspicious comments to pick out real ones. They waste the time of users who are forced to jump through hoops like site registration and CAPCHAs.

One way to help fight spam is to participate in Project Honeypot. If you run a website, they will give you a script to add somewhere. Then, you add links to the script that robots will follow, but not people. This allows the project to catalogue the IP addresses of robots, as well as track the general spam problem globally. People who run websites but don’t control the hosting (for instance, people with blogs on Blogger.com or WordPress.com) can add ‘QuickLinks’ which serve a similar function.

Stop Spam Harvesters, Join Project Honey Pot

People running WordPress blogs can also use the http:BL WordPress Plugin to take advantage of Project Honeypot’s data and block spammers and harvesters of email addresses.

Setting up a honeypot only takes a couple of minutes, and gives the satisfaction of knowing you are helping to make the internet a slightly more civil place. In addition to running a honeypot and using the http:BL plugin, this site has a wiki protected with Bad Behaviour, a blog protected with Akismet, and spam defences built into .htaccess.

Threaded comments

WordPress now supports the option of threaded comments, where people can respond to a specific comment in a sub-thread, rather than just adding to the bottom of a single list.

Do people think incorporating this feature would improve this site, or make it less functional?

I would have no objections to giving it a whirl if doing so was easily reversible, but it seems certain that any switch back to linear comments would turn threaded conversations into confusing messes. As such, I would have to be pretty certain the shift would be beneficial in order to make it.

Microsoft’s imitation Google

Microsoft’s new Bing search engine is a bit bewildering. To call it an homage to Google is an understatement: complete with ‘Web,’ ‘Images,’ ‘News,’ ‘Maps,’ etc across the top bar. While the bird’s eye feature in Bing Maps is a bit neat (it seems like it might be based on HDR images), one cannot easily shake the feeling that Microsoft decided to respond to Google’s approach by outright copying it. The only oddity is that, because I have my Windows language set to British English (so it knows how to spell ‘colour’), this makes Bing think I am in the UK, and the site offers me no option for showing Canadian results or news. Not very clever, given the ease with which an IP address can be turned into a location.

Has anybody discovered any Bing feature that is either quite different from or better than a Google offering? Hotmail certainly cannot begin to touch the searchable glory that is GMail.

Hashing with Wolfram Alpha

Separately, I have discussed both the Wolfram Alpha computational knowledge engine and the practice of hashing information. The fact that WA allows anyone to do so easily has relevance for things like making bets online, in situations where players want to conceal their guesses until everyone else has put theirs up.

Here is an example. Say you want to place bets on who will win the next Republican presidential primary. You don’t want those who post later to have the advantage of knowing what others have already posted, so you do the following:

  1. Choose a hash algorithm (MD5 should be fine, but SHA is more secure)
  2. Have each participant put their guess into WA. Say I think it will be Sarah Palin. I would enter: “SHA “I think the primary winner will be Sarah Palin, though I fear what she will do with the country” into Wolfram Alpha, and it would spit out something like “f7ca 4adf 11c7 5b56 f355 1635 5b50 2eca 5950 5349”
  3. Note that the supplementary text, in addition to the name, is vital. Otherwise, it would be trivially easy for the other players to check the hashes for likely guesses and learn what people have chosen. Incorporating a salt into the hashing algorithm would be ideal, but WA doesn’t seem to have that capability.
  4. Have each participant post the hash of their response, saving the exact text somewhere secure to them.
  5. When the outcome is known, those who guessed correctly can confirm that fact, by providing text that hashes into their original post.

A somewhat roundabout and nerdy solution to a relatively unimportant problem, perhaps, but it illustrates some of the ways hashes can be used to prove what you said earlier, without having the content of your earlier message immediately accessible – a general ability with many applications.

One more fact about salts: they are the most straightforward way to foil attacks using rainbow tables.

Colour-based Google image searches

Google Image Search now lets you search for images that are predominantly similar to twelve different colours. For instance, the set of all photos from my site they have indexed can be restricted to just those with red highlights or those dominated by blue.

All told, Google currently includes 204 images from my site in their index. Here is the colour breakdown:

  • Red: 10
  • Teal: 7
  • White: 11
  • Orange: 17
  • Blue: 25 (lots of the sky)
  • Grey: 41 (many of them in black and white)
  • Yellow: 2
  • Purple: 2
  • Black: 47
  • Green: 8
  • Pink: 0
  • Brown: 45

You can also search for various image types: news content, faces, clip art, line drawings, and photo content.

As ever, Google Image Search is a somewhat perplexing creation. It’s not clear why it selects the photos it does or how it ranks them. I look forward to further improvements in the service.