Privacy and Regression Model Preserved Learning

Authors: Jinfeng Yi, Jun Wang, Rong Jin

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical studies with a synthesized dataset and multiple sensitive benchmark datasets verify our theoretical claim as well as the effectiveness of the proposed framework. Experiments In this section, we first use simulated data to verify our theoretical claim, i.e., the columns of masked data Zm = [B; A] approximately lie in the subspace spanned by the columns of sensitive data Zs = [Y0; X0]. We then use two benchmark datasets for regression to verify the effectiveness of the proposed data masking framework.
Researcher Affiliation Collaboration Jinfeng Yi Jun Wang Rong Jin IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA {jyi,wangjun}@us.ibm.com Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA rongjin@cse.msu.edu
Pseudocode No The paper describes optimization problems and mentions the use of the Fixed Point Continuation method but does not provide any pseudocode or algorithm blocks for its proposed framework.
Open Source Code No The paper does not provide any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes ADNI dataset (Zhou et al. 2013) that is from the Alzheimer s Disease Neuroimaging Initiative database. School data set (Gong, Ye, and Zhang 2012) that is from the Inner London Education Authority (ILEA).
Dataset Splits Yes For both of these two sensitive datasets, we randomly sample 70% of the records as training data to generate the masked data for training regression models. We treat the remaining 30% of records as testing data for evaluating the regression performance. To obtain optimal models, we apply 5-fold cross validations on both masked and sensitive data with the regularization parameter of SVR ranging from 2 5 to 25.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud resources) used to run the experiments.
Software Dependencies No The paper mentions 'support vector regression (SVR)' and 'Fixed Point Continuation method' but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes For both of these two sensitive datasets, we randomly sample 70% of the records as training data to generate the masked data for training regression models. We treat the remaining 30% of records as testing data for evaluating the regression performance. To obtain optimal models, we apply 5-fold cross validations on both masked and sensitive data with the regularization parameter of SVR ranging from 2 5 to 25. The number of masked data m is set to be [n/3]. To verify the robustness of the proposed framework, we vary the variance σ2 in the range {0.05, 0.1, 0.2, 0.3, 0.4, 0.5}.