A Knowledge Transfer Framework for Differentially Private Sparse Learning

Authors: Lingxiao Wang, Quanquan Gu6235-6242

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further demonstrate the superiority of our framework through both synthetic and real-world data experiments.
Researcher Affiliation Academia Lingxiao Wang, Quanquan Gu Department of Computer Science, University of California, Los Angeles {lingxw, qgu}@cs.ucla.edu
Pseudocode Yes Algorithm 1 Differentially Private Sparse Learning via Knowledge Transfer (DPSL-KT) Algorithm 2 Iterative Gradient Hard Thresholding (IGHT)
Open Source Code No No explicit statement or link providing access to the source code for the work described in this paper.
Open Datasets Yes For real data experiments, we use E2006-TFIDF dataset (Kogan et al. 2009) and RCV1 dataset (Lewis et al. 2004), for the evaluation of sparse linear regression and sparse logistic regression, respectively.
Dataset Splits No No explicit validation set splits (e.g., specific percentages or sample counts for a validation set) are provided in the paper. The paper mentions training and testing examples from the E2006-TFIDF dataset, and then subdivides the original training set into private and public datasets for their framework, using cross-validation for parameter selection.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory) used for running experiments are provided in the text.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are explicitly listed in the paper.
Experiment Setup Yes For all of our experiments, we choose the parameters of different methods according to the requirements of their theoretical guarantees. More specifically, on the synthetic data experiments, we assume s is known for all the methods. On the real data experiments, s is unknown, neither our method or the competing methods has the knowledge of s . So we simply choose a sufficiently large s as a surrogate of s . Given s, for the parameter λ in our method, according to Theorem 4.5, we choose λ from a sequence of values c1 s log d log(1/δ)/(nϵ), where c1 {10 6, 10 5, . . . , 101}, by cross-validation. For competing methods, given s, we choose the iteration number of Frank-Wolfe from a sequence of values c2s, where c2 {0.5, 0.6, . . . , 1.5}, and the regularization parameter in the objective function of Two Stage from a sequence of values c3s/ϵ, where c3 {10 3, 10 2, . . . , 102}, by cross-validation. For DP-IGHT, we choose its stepsize from the grid {1/20, 1/21, . . . , 1/26} by cross-validation. For the non-private baseline, we use the non-private IGHT (Yuan, Li, and Zhang 2014).