Towards transparency and privacy in the online advertising business

User profiling in the time of HTTPS

Online user profiling is a profitable business extensively carried by third parties such as search engines, ad networks and networks providers. It leverages browsing activities to infer user interests and intentions. This activity can be in some cases undesired by Internet users that feel their privacy violated.

This profiling is usually done by companies in cooperation with the owners of the websites offering services to the users. Nevertheless, it is not new the interest of the network providers (and other possible network eavesdropper) to increase their revenue by analyzing their users’ data. It was 2006 when NebuAd started doing Deep Packet Inspection (DPI) in order to provide advertising solutions.

HTTPS enhances online users privacy by encrypting the communication between a browser and a webserver using TLS. It efficiently protects personal data from being observed by network eavesdropper, however, it was not designed to avoid the profiling of users.
Researchers from NEC Laboratories Europe and Telefonica I+D (involved in TYPES project) have investigated the accuracy of the profiles a network eavesdropper could infer in an HTTPS everywhere web. This work exposes how, even when TLS is used, the network eavesdropper could always infer the hosts users are visiting by using the Server Name Indication (SN) of TLS or the DNS requests. The knowledge of the contacted host is enough to correctly profile users visiting websites with a very homogeneous content, like those ones related with sports. Nevertheless, in other pages, the host is not a clear indication of the user interest. For example, a network eavesdropper knowing that a user has visited Amazon.com cannot infer whether she is interested in buying clothes, electronic devices or even cloud services.
Furthermore, the study shows how a simple off-the-shelf traffic classification would help an advanced network eavesdropper to correctly generate a user profile even for pages with heterogeneous content.

* More details about this study can be found at:

R. González, C. Soriente, N. Laoutaris
to be presented at 16th Internet Measurement Conference (IMC). 2016.

About the author
Roberto González | NEC

Roberto González | NEC

Leave a Reply