Footprints all over the web – Google Web History

Red brick facade and fire escapes

When I am online, I usually have at least one Google service open. At home, I usually have a Google Mail window open at all times, as well as Google Calendar. At work, it is only the latter. What I didn’t know until today is that whenever you are logged into your Google account, Google is tracking your web usage through a system called Web History. Accessing the system allows you to ‘pause’ the recording and even delete what is already there. While the listings disappear from your screen, there is good reason to doubt whether they vanish from Google’s records.

It is common knowledge that Google saves every search query that gets input into it, and does so in a way that can be linked to an individual computer. The web history service, however, has more troubling implications. Whether you are at work, at home, or at an internet cafe, you just need to be logged into any Google service for it to be operating. Since more than one computer can be logged into a Google account at once, and there is no indication on either machine that this is happening, anybody who gets your password can monitor your web usage, as well as your email and any other Google services you use. Given how common keyloggers have become, this should worry people.

One very helpful feature Google could implement would be the option to show when and where you last logged into your account. That way, if someone has been peeking at your email from London while you have been in Seattle, you know that it may be time to change your password. Also desirable, but much less likely to happen, would be a requirement that services like GMail store your information as an encrypted archive. Even if the encryption was based on your password and a relatively weak cipher, it would make it impractical for either Google or malicious agents with access to their information storage systems to undertake the wholesale mining of the information therein.

The final reason for which this is concerning has to do with cooperation between companies and governments. It is widely rumoured that companies including Microsoft and Yahoo have helped the Chinese government to track down and prosecute dissidents, by turning over electronic records held outside China. Given the increasingly bold snooping of both democratic and authoritarian governments, a few more layers of durable protection built into the system would be prudent and encouraging.

Author: Milan

In the spring of 2005, I graduated from the University of British Columbia with a degree in International Relations and a general focus in the area of environmental politics. In the fall of 2005, I began reading for an M.Phil in IR at Wadham College, Oxford. Outside school, I am very interested in photography, writing, and the outdoors. I am writing this blog to keep in touch with friends and family around the world, provide a more personal view of graduate student life in Oxford, and pass on some lessons I've learned here.

19 thoughts on “Footprints all over the web – Google Web History”

  1. 158 Million Records Exposed (And Counting)

    Lucas123 writes “According to the The Privacy Rights Clearing House 158 million records have been exposed over the past two years as a result of inadequate security. Data’s less secure today because as fast as banks, merchants and consumers add new layers of security to their storage systems and networks, new technologies — or simply careless users — create new security holes, according to Bob Scheier at Computerworld.”

  2. Hey Milan,

    Just a minor thing, but Google uses a highly distributed and optimized file storage. It would be impractical to encrypt everything due to performance problems, and would likely not improve the internal security at all. Their filesystem addresses individual chunks typically in gigabytes, so a snooper would likely have access to an internal tool if they wanted to make any sense of the data.

    It also wouldn’t make business sense for them to make their own data hard to mine. In fact their whole infrastructure was designed to do the opposite. I think that they will act in whatever way affords them the most profit, and some uses of data mining certainly fit that description. It may be a high-profile incident or two before they do a terminal-style “last login: xxx”, though.

  3. Brian,

    Why wouldn’t encrypting GMail archives improve internal security? I am thinking particularly of security in relation to Google employees and/or anyone they choose to give access to (whether with the approval of their managers or not).

    The fact that I use GMail clearly suggests that I think the utility of the service outweighs the risks of using it. Still, I would prefer it to be more secure.

    “It may be a high-profile incident or two before they do a terminal-style “last login: xxx”, though.”

    It seems unlikely that incidents could be definitively attributed to the kinds of attack the present arrangement is vulnerable to. That said, it is not impossible to construct some security systems of your own that rely upon similar ideas.

  4. Brian,

    On the matter of performance issues, would it be more feasible to encrypt messages one by one in a way transparent to the user, rather than that user’s whole archive? Alternatively, it could be done in relatively small chunks.

  5. Hi Milan,

    I believe Google Web History only works if you have the Google Toolbar installed, so it is basically opt-in. I know I certainly don’t have it turned on, finding it a worrying.
    Search History on the other hand (recording your web searches, but not every website you visit) is recording whenever you are logged into your account, and you have to opt out of this, I believe.
    The downside of opting out is that search personalization gets turned off too (or at least, so they say), although I am not convinced Google’s personalized search is much good at the moment.

  6. “I believe Google Web History only works if you have the Google Toolbar installed, so it is basically opt-in.”

    I have never installed Google Toolbar, and yet my Web History was full of searches, going back for months, when I looked at it. As far as I can tell, it only records Google searches of various types, including images.

    What is the use of search personalization? It seems to tailor searches by location even when you aren’t logged into any Google services.

  7. “My Web History was full of searches, going back for months”

    -Right. That’s “Search History”, which is recording whenever you’re logged into your Google account.
    “Web History” (which is recording if you use the Google Toolbar and opt in) records not only your searches, but any website you visit, even if you just type the URL in directly.

    You can turn off both of these if you want. However, Google will still tie your searches together using a cookie and keep these records for two years. If you really don’t want Google to keep any records about your searches, you have to turn off cookies (too annoying to be practical) or use a proxy like say
    http://www.blackboxsearch.com/

    or if you really care about your privacy, and don’t mind slow internet, use Tor:
    http://tor.eff.org/

    Re: Personalization – you’re right, they do tailor by location even when you’re not logged in. But they also try to tailor your results based on your search history. So if you search for say “hash table”, they might use your history to decide if you’re more likely to be interested in computer science articles or drug paraphernalia. From what I can gather, the personalization is pretty crude and not much use right now, but it is hard to tell.

  8. Published Google Docs To Appear In Search Engines

    “Google plans to make all published documents from Google Docs users crawlable, if the documents are linked from a public Web site. No official announcement appears to have been made, just a short blog post on the subject by a Google employee in a help forum. (One comment on the ghacks.net post linked above says that email was sent to the admins of Google Apps accounts.) There does not seem to be any way to make an individual document not crawlable; you can only un-publish it, at which point Web links to it will not work any more.” The move makes sense from one point of view — Google is just making crawlable a document linked from another crawlable document — but it’s likely to catch a lot of people by surprise.

  9. Google Voice voicemails appearing in public search results

    We’re not exactly sure what’s going on here, but it certainly seems like at least some Google Voice voicemails are being indexed and made publicly available somehow. If you punch in “site:https://www.google.com/voice/fm/*” as a search string you get a few pages of what appear to be test messages, with a couple eye-opening obvious non-tests scattered in there as well. Dates on these messages range from a couple months ago all the way until yesterday, so this is clearly an ongoing issue — hopefully Google patches this up awful fast.

  10. That’s quite a privacy slip-up. You can easily imagine some highly personal or sensitive information being in a voicemail message.

  11. You’re right that Google doesn’t actually delete your e-mail when you click “Delete.” According to a spokeswoman, e-mail remains on Google’s servers for 60 days after you’ve trashed it. Other Google services have a similar lag. It takes up to 30 days for a deleted document to get off Google Docs, 60 days for a deleted picture to vanish from Picasa Web Albums, and 90 days for deleted voice mail to be freed from Google Voice. Google, like all Web companies, does comply with court orders to produce user information, though it also has a history of fighting egregious requests. In 2006, Google managed to get a court to limit significantly the scope of a Justice Department subpoena requesting two months’ worth of search data from Google users.

    But to be on the safe side, you’d better delete anything that might be of interest to the government at least two months before they’re likely to get wind of it. Or keep evidence of your shady dealings in a USB drive under your mattress.

  12. 76% Web Users Affected By Browser History Stealing

    “Web browser history detection with the CSS:visited trick has been known for the last ten years, but recently published research suggests that the problem is bigger than previously thought. A study of 243,068 users found that 76% of them were vulnerable to history detection by malicious websites. Newer browsers such as Safari and Chrome were even more affected, with 82% and 94% of users vulnerable. An average of 63 visited locations were detected per user, and for the top 10% of users the tests found over 150 visited sites. The website has a summary of the findings; the full paper (PDF) is available as well.”

  13. Who does Google know that you know?
    Xeni Jardin at 4:07 PM Friday, Aug 6, 2010

    [It] is the network of connections Google uses to identify relevant social search results. It is based on a combination of the following:

    • Direct connections from your Google chat buddies and contacts
    • Direct connections from links that appear on your Google profile
    • Secondary connections that are publicly associated with your direct connections

    In addition to web pages from your social circle, posts from your Google Reader subscriptions may also appear in your social search results.

  14. Retargeting Ads Stalk You For Weeks After You Shop

    “The New York Times is reporting on a new kind of web ad that takes products you were looking at purchasing on one site and continually advertises them in front of you at subsequent sites. After looking at shoes at Zappos, a mother in Montreal noticed the shoes followed her: ‘For days or weeks, every site I went to seemed to be showing me ads for those shoes. It is a pretty clever marketing tool. But it’s a little creepy, especially if you don’t know what’s going on.’ The spreading ploy is called ‘retargeting ads’ and really are just a good demonstration of how an old technology (all they use are leftover browser cookies) are truly invasive and privacy violating. Opponents are clamoring for government regulation to protect the consumer and one writer mentioned a consumer ‘do not track’ list — adding that retailers really show little fear of turning off customers with their invasion.”

  15. There is no way to get around the fact that the current batch of top Web browsers were designed as advertising delivery systems first and editorial delivery systems second. The companies behind three of the four top browsers, Microsoft (Internet Explorer), Google (Chrome), and Apple (Safari), are all deeply invested in the advertising business. The company that makes the Firefox browser has been the benefactor of Google millions, which come primarily from advertising. The folks who make the Opera browser have likewise cashed Google’s checks.

    None of the software companies set out to make porous, easily breached browsers. But it hasn’t been in their interests to make impregnable ones. The best illustration of who is driving Web-browser development can be found in the Aug. 2, 2010, Wall Street Journal article “Microsoft Quashed Effort to Boost Online Privacy.” The piece documents how Microsoft product planners wanted to bake security features into the company’s Internet Explorer 8 Web browser that would “automatically thwart common tracking tools,” as the Journal reports.

  16. The best journalism about how marketers hunt and record your journeys into Cyberia can be found in the Wall Street Journal’s ongoing What They Know series. They aggressively spy on Web users, building dossiers on your likes and dislikes, your gender, your income level, your place of residence, and even your health. They then sell this information to other firms. For instance, this Journal piece shows how InfoCheckUSA “scrapes” social-media data and markets them to companies that are assessing job applicants. This piece reports the comeback of “deep packet inspection,” which profiles Internet users based on the data generated by their Web surfing. These companies and others make browsing the Web akin to walking down a block with 47 security cameras peering into your wallet and psyche. This spying has become so rampant that even Web giants like Comcast and Microsoft aren’t always aware of every powerful tracking cookie they drop on users’ computers, as this report from the Journal series shows.

  17. The suit, among the first to target history sniffing, alleges that YouPorn used technology that can “peek in on the plaintiffs’ Internet-visitation history” by exploiting a vulnerability in Web browsers and failed to disclose to users that the site was doing so. The suit filed in U.S. District Court for the Central District of California, seeks unspecified damages and an injunction to stop YouPorn from using the technology. Plaintiffs David Pitner and Jared Reagan, both of Newport Beach, Calif., allege that their privacy was violated by YouPorn and are seeking class-action status for their suit.

    The site was one of 46 listed in an October paper by researchers at the University of California, San Diego, as running history-sniffing technology. The sites cover a range of topics, including sports and investing.

Leave a Reply

Your email address will not be published. Required fields are marked *