June 2012 Web Privacy Census

Public policymakers are proposing measures to give consumers more privacy rights online. These measures are based upon the assumption that the web privacy landscape has become worse for consumers; that their online activities are tracked more pervasively now than they were in the past. This assumption may be true, as online advertising and metrics companies have developed more sophisticated ways to track and identify individuals online. This has been substantiated in the academic literature, and in the popular press through an influential news series, “What they Know,” by Wall Street Journal reporters.
As policymakers consider different approaches for addressing internet privacy, it is critical to understand how interventions such as negative press attention, self-regulation, Federal Trade Commission enforcement actions, and direct regulation affect tracking. As early as 1995, Beth Givens of the Privacy Rights Clearinghouse suggested that federal agencies create benchmarks for online privacy. The first attempts of web measurement, discussed in our literature review, found relatively little tracking online in 1997--only 23 of the most popular websites were using cookies on their homepages. But within a few years, tracking for network advertising was present on many websites, and by 2011, all of the most popular websites employed cookies.
The Web Privacy Census is intended to formalize the benchmarking process and measure internet tracking consistently over time. We seek to explore:
  • How many entities are tracking users online?
  • What vectors (technologies) are most popular for tracking users?
  • Is there displacement (i.e. a shift from one tracking technology to another) in tracking practices?
  • Is there greater concentration of tracking companies online?
  • What entities have the greatest potential for online tracking and why?
This effort was developed and executed in partnership with Abine, Inc. Abine has been our technical collaborator and resource partner, helping us develop a reliable method for web crawling and analysis of tracking vectors.
In this report, we discuss the results of a crawl conducted on 5/17/12. We found cookies on all popular websites (by “popular websites,” we mean the top 100 most popular according to Quantcast). We conduct two different crawls—a shallow one where our test browser just visits the homepage of a site, and a deep crawl where our browser visits six links on a site. Our shallow crawl of the 25,000 most popular sites revealed that 87% have cookies (24% first, 76% third), 9% had HTML5 storage objects, and less than .0001% had flash cookies. Twenty-five percent of cookies include names such as “UID” and “GUID”, suggesting that they are used for uniquely identifying users. Overall, we found that flash cookie usage is dropping and HTML5 storage use is rising and at least one tracker is using HTML5 local storage to hold unique identifiers from third party cookies.

For the Full Report:

