Privacy and Regression Model Preserved Learning
Authors: Jinfeng Yi, Jun Wang, Rong Jin
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical studies with a synthesized dataset and multiple sensitive benchmark datasets verify our theoretical claim as well as the effectiveness of the proposed framework. Experiments In this section, we first use simulated data to verify our theoretical claim, i.e., the columns of masked data Zm = [B; A] approximately lie in the subspace spanned by the columns of sensitive data Zs = [Y0; X0]. We then use two benchmark datasets for regression to verify the effectiveness of the proposed data masking framework. |
| Researcher Affiliation | Collaboration | Jinfeng Yi Jun Wang Rong Jin IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA {jyi,wangjun}@us.ibm.com Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA rongjin@cse.msu.edu |
| Pseudocode | No | The paper describes optimization problems and mentions the use of the Fixed Point Continuation method but does not provide any pseudocode or algorithm blocks for its proposed framework. |
| Open Source Code | No | The paper does not provide any statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | ADNI dataset (Zhou et al. 2013) that is from the Alzheimer s Disease Neuroimaging Initiative database. School data set (Gong, Ye, and Zhang 2012) that is from the Inner London Education Authority (ILEA). |
| Dataset Splits | Yes | For both of these two sensitive datasets, we randomly sample 70% of the records as training data to generate the masked data for training regression models. We treat the remaining 30% of records as testing data for evaluating the regression performance. To obtain optimal models, we apply 5-fold cross validations on both masked and sensitive data with the regularization parameter of SVR ranging from 2 5 to 25. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud resources) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'support vector regression (SVR)' and 'Fixed Point Continuation method' but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | For both of these two sensitive datasets, we randomly sample 70% of the records as training data to generate the masked data for training regression models. We treat the remaining 30% of records as testing data for evaluating the regression performance. To obtain optimal models, we apply 5-fold cross validations on both masked and sensitive data with the regularization parameter of SVR ranging from 2 5 to 25. The number of masked data m is set to be [n/3]. To verify the robustness of the proposed framework, we vary the variance σ2 in the range {0.05, 0.1, 0.2, 0.3, 0.4, 0.5}. |