Information Laundering for Model Privacy

Authors: Xinran Wang, Yu Xiang, Jun Gao, Jie Ding

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide some experimental studies to illustrate the concepts
Researcher Affiliation Academia Xinran Wang School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA wang8740@umn.edu Yu Xiang Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA yu.xiang@utah.edu Jun Gao Department of Mathematics Stanford University Stanford, CA 94305, USA jung2@stanford.edu Jie Ding School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA dingj@umn.edu
Pseudocode Yes Algorithm 1 Optimized Information Laundering (OIL) and Algorithm 2 OIL-Y (a special case of Algorithm 1, in the matrix form)
Open Source Code No The paper does not provide a specific link or statement indicating that its source code is publicly available.
Open Datasets Yes In this experimental study, we use the 20-newsgroups dataset provided by scikit-learn opensource library (Scikit-learn, 2020d)... we use the life expectancy dataset provided by kaggle open-source data (Kaggle, 2020)... Alice uses half of the Breast Cancer dataset (Scikit-learn, 2020b)...
Dataset Splits Yes To evaluate the out-sample utility, we split the data into two parts using the default option provided in (Scikit-learn, 2020d), which results in a training part (2245 samples, 49914 features) and a testing part (1494 samples, 49914 features).
Hardware Specification No The paper does not specify the hardware used for experiments (e.g., CPU, GPU models, or memory).
Software Dependencies No The paper mentions using 'scikit-learn open-source library' but does not specify version numbers for any software dependencies.
Experiment Setup Yes Alice trains a classifier using the Naive Bayes method and records the frequency of observing each category [0.220.270.210.30] (r in Algorithm 2). Then, Alice runs the OIL-Y Algorithm (under a given β2) to obtain the transition probability matrix P [0, 1]4 4. In the regression model, we quantize the output alphabet Y by 30 points equally-spaced in between µ 3σ, where µ, σ represent the mean and the standard deviation of Y in the training data. In Figure 8(a), Alice uses half of the Breast Cancer dataset (Scikit-learn, 2020b) (standardized) to train a Logistic classification model. In Figure 9, Alice used half of the simulated Moons dataset (Scikit-learn, 2020c) (with 1000 samples, 0.1 standard deviation for the noise) to train a Random Forest model.