Privacy-Preserving Classification of Personal Text Messages with Secure Multi-Party Computation

Authors: Devin Reich, Ariel Todoki, Rafael Dowsley, Martine De Cock, anderson nascimento

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform end-to-end experiments with an application for detecting hate speech against women and immigrants, demonstrating excellent runtime results without loss of accuracy. We evaluate the proposed protocols in a use case for the detection of hate speech in short text messages, using data from [6].
Researcher Affiliation Academia Devin Reich1, Ariel Todoki1, Rafael Dowsley2, Martine De Cock1 , Anderson Nascimento1 1 School of Engineering and Technology University of Washington Tacoma Tacoma, WA 98402 {dreich,atodoki,mdecock,andclay}@uw.edu 2Department of Computer Science Bar-Ilan University, 5290002, Ramat-Gan, Israel rafael@dowsley.net
Pseudocode Yes The paper presents structured steps for protocols such as 'Protocol EQ', 'Protocol FE', 'Protocol AB', 'Protocol TC LR', and 'Protocol TC AB'. For example, 'Protocol AB: Alice and Bob hold secret sharings [[xi]]q of each of the n binary features xi. Bob holds the trained Ada Boost model which consists of two weighted probability vectors y...'
Open Source Code Yes We build our protocols using a privacy-preserving machine learning (PPML) framework based on SMC developed by us3. 3https://bitbucket.org/uwtppml
Open Datasets Yes We evaluate the proposed protocols in a use case for the detection of hate speech in short text messages, using data from [6]. [6] Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, and Manuela Sanguinetti. Semeval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter.
Dataset Splits Yes The models are evaluated using 5-fold cross-validation over the entire corpus of 10,000 tweets.
Hardware Specification Yes We implemented the protocols from Section 3 in Java and ran experiments on AWS c5.9xlarge machines with 36 v CPUs, 72.0 Gi B Memory.
Software Dependencies No The paper states, 'We implemented the protocols from Section 3 in Java,' but does not specify a Java version or any other software dependencies with version numbers.
Experiment Setup Yes The models are evaluated using 5-fold cross-validation over the entire corpus of 10,000 tweets. Each runtime experiment was repeated 3 times and average results are reported. Table 1 shows accuracy results for a variety of models trained to classify a tweet as hate speech vs. non-hate speech, including tree ensemble models consisting of 50, 200, and 500 decision stumps, and LR models trained on 50, 200, 500, and all features.