reproducibilityindex.ai

Banded Square Root Matrix Factorization for Differentially Private Model Training

Authors: Kalinin Nikita, Christoph H. Lampert

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments demonstrate that models trained using BSR perform on par with the best existing methods, while completely avoiding their computational overhead.
Researcher Affiliation	Academia	Nikita Kalinin Institute of Science and Technology (ISTA) Klosterneuburg, Austria nikita.kalinin@ist.ac.at Christoph Lampert Institute of Science and Technology (ISTA) Klosterneuburg, Austria chl@ist.ac.at
Pseudocode	Yes	Algorithm 1 Differentially Private SGD with Matrix Factorization
Open Source Code	Yes	To compute AOF, we solve the optimization problem (4) using the cvxpy package with SCS backend, see Algorithm B for the source code3.
Open Datasets	Yes	To demonstrate the usefulness of BSR in practical settings, we follow the setup of Kairouz et al. [2021] and report results for training a simple Conv Net on the CIFAR-10 dataset (see Table 1 in Appendix C for the architecture).
Dataset Splits	Yes	In both cases, 20% of the training examples are used as validation sets to determine the learning rate η {0.01, 0.05, 0.1, 0.5, 1}, weight decay parameters α {0.99, 0.999, 0.9999, 1}, and momentum β {0, 0.9}.
Hardware Specification	Yes	Note that while the experiments for BSR and CVX used a single-core CPU-only environment, the experiments for GD and LBFGS were run on an NVIDIA H100 GPU with 16 available CPU cores.
Software Dependencies	No	The paper mentions software like 'python/numpy code', 'cvxpy package with SCS backend', 'jax', and 'optax toolbox', but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	To reﬂect the setting of single-participation training, we split the 50,000 training examples into batches of size m {1000, 500, 250, 200, 100, 50, 25}, resulting in n {100, 200, 400, 500, 1000, 2000} update steps. For repeated participation, we ﬁx the batch size to 500 and run k {1, 2, . . . , 10, 15, 20} epoch of training, i.e. n = 100k and b = 100. In both cases, 20% of the training examples are used as validation sets to determine the learning rate η {0.01, 0.05, 0.1, 0.5, 1}, weight decay parameters α {0.99, 0.999, 0.9999, 1}, and momentum β {0, 0.9}.