Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations
Authors: Edward Raff, Amol Khanna, Fred Lu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results demonstrate that this procedure can reduce runtime by a factor of up to 2, 200 , depending on the value of the privacy parameter ϵ and the sparsity of the dataset. We test our algorithm on high-dimensional problems with up to 8.4 million datapoints and 20 million features on a single machine and find speedups ranging from 10 to 2, 200 that of the standard DP Frank-Wolfe method. We test our algorithm on high-dimensional problems with up to 8.4 million datapoints and 20 million features on a single machine and find speedups ranging from 10 to 2, 200 that of the standard DP Frank-Wolfe method. |
| Researcher Affiliation | Industry | Edward Raff Booz Allen Hamilton raff_edward@bah.com Amol Khanna Booz Allen Hamilton khanna_amol@bah.com Fred Lu Booz Allen Hamilton lu_fred@bah.com |
| Pseudocode | Yes | Algorithm 1 Standard Sparse-Aware Frank Wolfe, Algorithm 2 Fast Sparse-Aware Frank-Wolfe for Linear Models, Algorithm 3 Fibonacci Heap Frank-Wolfe Queue Maintenance, Algorithm 4 Big-Step Little-Step Exponential Sampler |
| Open Source Code | No | Our code was written in Java due to the need for explicit looping, and implemented using the JSAT library [36]. There is no explicit statement of code release or a repository link for the specific contributions of this paper. |
| Open Datasets | Yes | Table 2: Datasets used for evaluation. We focus on cases that are high-dimensional and sparse. Dataset N D RCV1 20,242 47,236 News20 |
| Dataset Splits | No | The paper uses the phrase "training iterations" and discusses evaluating on "test datasets" but does not explicitly mention validation splits or their proportions. |
| Hardware Specification | Yes | All experiments were run on a machine with 12 CPU cores (though only one core was used), and 128GB of RAM. |
| Software Dependencies | Yes | Our code was written in Java due to the need for explicit looping, and implemented using the JSAT library [36]. |
| Experiment Setup | Yes | Due to the computational requirements of running all tests, we fix the total number of iterations T = 4, 000 and maximum L1 norm for the Lasso constraint to be λ = 50 in all tests across all datasets. This value produces highly accurate models in all non-private cases, and the goal of our work is not to perform hyper-parameter tuning but to demonstrate that we have taken an already known algorithm with established convergence rates and made each iteration more efficient. |