Authors: Roberto Gonzalez, Lili Jiang, Mohamed Ahmed, Miriam Marciel, Ruben Cuevas, Hassan Metwalley, Saverio Niccolini
Users online are commonly tracked using HTTP cookies when browsing on the web. To protect their privacy, users tend to use simple tools to block the activity of HTTP cookies. However, the “block all” design of tools breaks critical web services or severely limits the online advertising ecosystem. Therefore, to ease this tension, a more nuanced strategy that discerns better the intended functionality of the HTTP cookies users encounter is required. We present the first large-scale study of the use of HTTP cookies in the wild using network traces containing more than 5.6 billion HTTP requests from real users for a period of two and a half months. We first present a statistical analysis of how cookies are used. We then analyze the structure of cookies and observe that; HTTP cookies are significantly more sophisticated than the name=value defined by the standard and assumed by researchers and developers. Based on our findings we present an algorithm that is able to extract the information included in 86% of the cookies in our dataset with an accuracy of 91.7%. Finally, we discuss the implications of our findings and provide solutions that can be used to improve the most promising privacy preserving tools.
- We analyze the information included into the HTTP cookies using real data coming from real users.
- We discover HTTP cookies are far more complicated than indicated by the standard and assumed in previous works. Making difficult the applicability of that works.
- We developed an algorithm able to identify different pieces of information contained into one cookie that allows a subsequent individual analysis.
Read the entire paper here.