In many settings, competing parties that own proprietary confidential databases need to mine the data which they hold in order to distill from them meaningful findings that could be used to improve their performances and benefit their businesses. By sharing their respective private databases, in order to increase the size and diversity of the training data, they can improve the quality and accuracy of the obtained data mining results. However, such competing parties may be reluctant to share their data as it may lose them their commercial advantage; in addition, sharing such data may be inconsistent with privacy legislation, in case those databases hold information on individuals. This conflict raises the question: to share or not to share?
The advent of secure multi-party computation, a central field of study in cryptography, and the closely related field of privacy preserving data mining, enabled answering this question by replacing the conjunctive article “or’’ with “and’’. Algorithms that were designed over the last three and a half decades enable mutually non-trusting parties to share their respective data while at the same time not sharing them, in the following sense: they allow those parties to perform computations on the union of inputs that each of them owns, without actually revealing to each other the value of those private inputs. For example, the first such problem that was introduced by Andrew Yao in 1982 is called The Millionaires’ Problem. It allows two parties to know which one of them holds a larger number (that could reflect their wealth) without actually revealing to each other those numbers. The problems that were addressed and solved over the years enable much more complex computations: For example, several hospitals can draw conclusions on responses of patients to medical procedures and medications, based on the union of the databases which they hold, without revealing to each other the content of their databases (in order to respect their patients’ privacy).
Recently, the need for performing privacy preserving data mining became most apparent when the private databases take the form of social networks. Nowadays, social networking platforms, such as Facebook or Twitter, have realized that their proprietary social graph is an important asset with inestimable value, thus they keep it secret for obvious reasons of commercial benefits, as well as due to privacy legislation. At the same time, as users are actively taking part in more than one social network, the analysis of social dynamics can benefit greatly from performing data mining collaborately on several social networks. For example, a fundamental problem in viral marketing is the problem of influence maximization: if a company wishes to advertise a new product, and it has a specified budget allocated for the purpose of “pushing’’ the new product to a small set of users, it is desirable to find an initial set of users that, based on previous records of their influence on peer users, the expected resulting spread of the new product, as driven by word of mouth viral cascades, will be maximal. One of the approaches to solve this computational problem is to compute for each user in the underlying society a score that indicates her centrality (or influence or prestige) in that society. If each social network platform performs that computation separately based only on evidence of information propagation that it witnesses within its network, that computation can be made without leaking sensitive information to other platforms. Alas, such a computation would give less useful results than a computation that considers simultaneously several networks. Indeed, for viral marketing purposes, it does not matter what is the social network that serves as the vehicle of information propagation amongst users; as long as the message propagates by word-of-mouth, the desired viral marketing dynamics is taking place. Hence, the identification of the most central or influential users would better rely on multiple social network platforms.
In the framework of the Horizon 2020 project TYPES, we developed multi-party protocols that perform such computations while being secure in the information theoretic sense (namely, they reveal zero information to the interacting parties). Our main protocol can be executed on today’s large-scale social networks within few minutes. We believe that such protocols will be adopted by the industry in the coming years in order to allow efficient execution of viral marketing campaigns, while respecting the commercial concerns of social network platforms and the private information of all of us.