Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Knowledge Transfer Framework for Differentially Private Sparse Learning
Authors: Lingxiao Wang, Quanquan Gu6235-6242
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further demonstrate the superiority of our framework through both synthetic and real-world data experiments. |
| Researcher Affiliation | Academia | Lingxiao Wang, Quanquan Gu Department of Computer Science, University of California, Los Angeles EMAIL |
| Pseudocode | Yes | Algorithm 1 Differentially Private Sparse Learning via Knowledge Transfer (DPSL-KT) Algorithm 2 Iterative Gradient Hard Thresholding (IGHT) |
| Open Source Code | No | No explicit statement or link providing access to the source code for the work described in this paper. |
| Open Datasets | Yes | For real data experiments, we use E2006-TFIDF dataset (Kogan et al. 2009) and RCV1 dataset (Lewis et al. 2004), for the evaluation of sparse linear regression and sparse logistic regression, respectively. |
| Dataset Splits | No | No explicit validation set splits (e.g., specific percentages or sample counts for a validation set) are provided in the paper. The paper mentions training and testing examples from the E2006-TFIDF dataset, and then subdivides the original training set into private and public datasets for their framework, using cross-validation for parameter selection. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory) used for running experiments are provided in the text. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are explicitly listed in the paper. |
| Experiment Setup | Yes | For all of our experiments, we choose the parameters of different methods according to the requirements of their theoretical guarantees. More specifically, on the synthetic data experiments, we assume s is known for all the methods. On the real data experiments, s is unknown, neither our method or the competing methods has the knowledge of s . So we simply choose a sufficiently large s as a surrogate of s . Given s, for the parameter λ in our method, according to Theorem 4.5, we choose λ from a sequence of values c1 s log d log(1/δ)/(nϵ), where c1 {10 6, 10 5, . . . , 101}, by cross-validation. For competing methods, given s, we choose the iteration number of Frank-Wolfe from a sequence of values c2s, where c2 {0.5, 0.6, . . . , 1.5}, and the regularization parameter in the objective function of Two Stage from a sequence of values c3s/ϵ, where c3 {10 3, 10 2, . . . , 102}, by cross-validation. For DP-IGHT, we choose its stepsize from the grid {1/20, 1/21, . . . , 1/26} by cross-validation. For the non-private baseline, we use the non-private IGHT (Yuan, Li, and Zhang 2014). |