Information Laundering for Model Privacy
Authors: Xinran Wang, Yu Xiang, Jun Gao, Jie Ding
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide some experimental studies to illustrate the concepts |
| Researcher Affiliation | Academia | Xinran Wang School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA wang8740@umn.edu Yu Xiang Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA yu.xiang@utah.edu Jun Gao Department of Mathematics Stanford University Stanford, CA 94305, USA jung2@stanford.edu Jie Ding School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA dingj@umn.edu |
| Pseudocode | Yes | Algorithm 1 Optimized Information Laundering (OIL) and Algorithm 2 OIL-Y (a special case of Algorithm 1, in the matrix form) |
| Open Source Code | No | The paper does not provide a specific link or statement indicating that its source code is publicly available. |
| Open Datasets | Yes | In this experimental study, we use the 20-newsgroups dataset provided by scikit-learn opensource library (Scikit-learn, 2020d)... we use the life expectancy dataset provided by kaggle open-source data (Kaggle, 2020)... Alice uses half of the Breast Cancer dataset (Scikit-learn, 2020b)... |
| Dataset Splits | Yes | To evaluate the out-sample utility, we split the data into two parts using the default option provided in (Scikit-learn, 2020d), which results in a training part (2245 samples, 49914 features) and a testing part (1494 samples, 49914 features). |
| Hardware Specification | No | The paper does not specify the hardware used for experiments (e.g., CPU, GPU models, or memory). |
| Software Dependencies | No | The paper mentions using 'scikit-learn open-source library' but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Alice trains a classifier using the Naive Bayes method and records the frequency of observing each category [0.220.270.210.30] (r in Algorithm 2). Then, Alice runs the OIL-Y Algorithm (under a given β2) to obtain the transition probability matrix P [0, 1]4 4. In the regression model, we quantize the output alphabet Y by 30 points equally-spaced in between µ 3σ, where µ, σ represent the mean and the standard deviation of Y in the training data. In Figure 8(a), Alice uses half of the Breast Cancer dataset (Scikit-learn, 2020b) (standardized) to train a Logistic classification model. In Figure 9, Alice used half of the simulated Moons dataset (Scikit-learn, 2020c) (with 1000 samples, 0.1 standard deviation for the noise) to train a Random Forest model. |