Banded Square Root Matrix Factorization for Differentially Private Model Training

Authors: Kalinin Nikita, Christoph H. Lampert

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments demonstrate that models trained using BSR perform on par with the best existing methods, while completely avoiding their computational overhead.
Researcher Affiliation Academia Nikita Kalinin Institute of Science and Technology (ISTA) Klosterneuburg, Austria nikita.kalinin@ist.ac.at Christoph Lampert Institute of Science and Technology (ISTA) Klosterneuburg, Austria chl@ist.ac.at
Pseudocode Yes Algorithm 1 Differentially Private SGD with Matrix Factorization
Open Source Code Yes To compute AOF, we solve the optimization problem (4) using the cvxpy package with SCS backend, see Algorithm B for the source code3.
Open Datasets Yes To demonstrate the usefulness of BSR in practical settings, we follow the setup of Kairouz et al. [2021] and report results for training a simple Conv Net on the CIFAR-10 dataset (see Table 1 in Appendix C for the architecture).
Dataset Splits Yes In both cases, 20% of the training examples are used as validation sets to determine the learning rate η {0.01, 0.05, 0.1, 0.5, 1}, weight decay parameters α {0.99, 0.999, 0.9999, 1}, and momentum β {0, 0.9}.
Hardware Specification Yes Note that while the experiments for BSR and CVX used a single-core CPU-only environment, the experiments for GD and LBFGS were run on an NVIDIA H100 GPU with 16 available CPU cores.
Software Dependencies No The paper mentions software like 'python/numpy code', 'cvxpy package with SCS backend', 'jax', and 'optax toolbox', but does not provide specific version numbers for these dependencies.
Experiment Setup Yes To reflect the setting of single-participation training, we split the 50,000 training examples into batches of size m {1000, 500, 250, 200, 100, 50, 25}, resulting in n {100, 200, 400, 500, 1000, 2000} update steps. For repeated participation, we fix the batch size to 500 and run k {1, 2, . . . , 10, 15, 20} epoch of training, i.e. n = 100k and b = 100. In both cases, 20% of the training examples are used as validation sets to determine the learning rate η {0.01, 0.05, 0.1, 0.5, 1}, weight decay parameters α {0.99, 0.999, 0.9999, 1}, and momentum β {0, 0.9}.