Sara Hajian, Tamir Tassa, Francesco Bonchi
Appeared in Social Network Analysis and Mining, December 2016
Online social networking platforms have the possibility to collect an incredibly rich set of information about their users: the people they talk to, the people they follow and trust, the people they can influence, as well as their hobbies, interests, and topics in which they are authoritative. Analyzing these data creates fascinating opportunities for expanding our understanding about social structures and phenomena such as social influence, trust and their dynamics. At the same time, mining this type of rich information allows building novel online services, and it represents a great resource for advertisers and for building viral marketing campaigns. Sharing social-network graphs, however, raises important privacy concerns. To alleviate this problem, several anonymization methods have been proposed that aim at reducing the risk of a privacy breach on the published data while still allowing to analyze them and draw relevant conclusions. The bulk of those proposals only considers publishing the network structure, that is a simple (often undirected) graph. In this paper we study the problem of preserving users’ individual privacy when publishing information-rich social networks. In particular, we consider the obfuscation of users’ identities in a topic-dependent social influence network, i.e., a directed graph where each edge is enriched by a topic model that represents the strength of the social influence along the edge per topic. This information-rich graph is obviously much harder to anonymize than standard graphs. We propose here to obfuscate the identity of nodes in the network by randomly perturbing the network structure and the topic model. We then formalize our privacy notion, k-obfuscation, and show how to evaluate the level of obfuscation under a strong adversarial assumption. Experiments on two social networks confirm that randomization can successfully protect the privacy of the users while maintaining high-quality data for applications, such as influence maximization for viral marketing.
- We study, for the first time, the problem of protecting the identity of users when publishing social network data enriched with topic-dependent social influence information, i.e., a topic-dependent social influence network. We propose a method based on random perturbation of the network structure and the associated topic model.
- We formalize our privacy notion, $k$-obfuscation, and show how to evaluate the level of obfuscation achieved by the random perturbation method under a strong adversarial assumption.
- We experiment on two real-world datasets, where the topic-dependent influence associated with each social link is learned from real propagations data, using the expectation-maximization method. In our experiments we report the levels of identity obfuscation achieved and the utility preserved in the perturbed data for different levels of strength of the randomization.
- The utility preserved in the randomized data is shown in terms of structural properties of the social graph, and by running topic-aware influence maximization queries on the original and on the perturbed graphs.
Read the entire paper here.