reproducibilityindex.ai

Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations

Authors: Edward Raff, Amol Khanna, Fred Lu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results demonstrate that this procedure can reduce runtime by a factor of up to 2, 200 , depending on the value of the privacy parameter ϵ and the sparsity of the dataset. We test our algorithm on high-dimensional problems with up to 8.4 million datapoints and 20 million features on a single machine and find speedups ranging from 10 to 2, 200 that of the standard DP Frank-Wolfe method. We test our algorithm on high-dimensional problems with up to 8.4 million datapoints and 20 million features on a single machine and find speedups ranging from 10 to 2, 200 that of the standard DP Frank-Wolfe method.
Researcher Affiliation	Industry	Edward Raff Booz Allen Hamilton raff_edward@bah.com Amol Khanna Booz Allen Hamilton khanna_amol@bah.com Fred Lu Booz Allen Hamilton lu_fred@bah.com
Pseudocode	Yes	Algorithm 1 Standard Sparse-Aware Frank Wolfe, Algorithm 2 Fast Sparse-Aware Frank-Wolfe for Linear Models, Algorithm 3 Fibonacci Heap Frank-Wolfe Queue Maintenance, Algorithm 4 Big-Step Little-Step Exponential Sampler
Open Source Code	No	Our code was written in Java due to the need for explicit looping, and implemented using the JSAT library [36]. There is no explicit statement of code release or a repository link for the specific contributions of this paper.
Open Datasets	Yes	Table 2: Datasets used for evaluation. We focus on cases that are high-dimensional and sparse. Dataset N D RCV1 20,242 47,236 News20
Dataset Splits	No	The paper uses the phrase "training iterations" and discusses evaluating on "test datasets" but does not explicitly mention validation splits or their proportions.
Hardware Specification	Yes	All experiments were run on a machine with 12 CPU cores (though only one core was used), and 128GB of RAM.
Software Dependencies	Yes	Our code was written in Java due to the need for explicit looping, and implemented using the JSAT library [36].
Experiment Setup	Yes	Due to the computational requirements of running all tests, we fix the total number of iterations T = 4, 000 and maximum L1 norm for the Lasso constraint to be λ = 50 in all tests across all datasets. This value produces highly accurate models in all non-private cases, and the goal of our work is not to perform hyper-parameter tuning but to demonstrate that we have taken an already known algorithm with established convergence rates and made each iteration more efficient.