Banded Square Root Matrix Factorization for Differentially Private Model Training
Authors: Kalinin Nikita, Christoph H. Lampert
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments demonstrate that models trained using BSR perform on par with the best existing methods, while completely avoiding their computational overhead. |
| Researcher Affiliation | Academia | Nikita Kalinin Institute of Science and Technology (ISTA) Klosterneuburg, Austria nikita.kalinin@ist.ac.at Christoph Lampert Institute of Science and Technology (ISTA) Klosterneuburg, Austria chl@ist.ac.at |
| Pseudocode | Yes | Algorithm 1 Differentially Private SGD with Matrix Factorization |
| Open Source Code | Yes | To compute AOF, we solve the optimization problem (4) using the cvxpy package with SCS backend, see Algorithm B for the source code3. |
| Open Datasets | Yes | To demonstrate the usefulness of BSR in practical settings, we follow the setup of Kairouz et al. [2021] and report results for training a simple Conv Net on the CIFAR-10 dataset (see Table 1 in Appendix C for the architecture). |
| Dataset Splits | Yes | In both cases, 20% of the training examples are used as validation sets to determine the learning rate η {0.01, 0.05, 0.1, 0.5, 1}, weight decay parameters α {0.99, 0.999, 0.9999, 1}, and momentum β {0, 0.9}. |
| Hardware Specification | Yes | Note that while the experiments for BSR and CVX used a single-core CPU-only environment, the experiments for GD and LBFGS were run on an NVIDIA H100 GPU with 16 available CPU cores. |
| Software Dependencies | No | The paper mentions software like 'python/numpy code', 'cvxpy package with SCS backend', 'jax', and 'optax toolbox', but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | To reflect the setting of single-participation training, we split the 50,000 training examples into batches of size m {1000, 500, 250, 200, 100, 50, 25}, resulting in n {100, 200, 400, 500, 1000, 2000} update steps. For repeated participation, we fix the batch size to 500 and run k {1, 2, . . . , 10, 15, 20} epoch of training, i.e. n = 100k and b = 100. In both cases, 20% of the training examples are used as validation sets to determine the learning rate η {0.01, 0.05, 0.1, 0.5, 1}, weight decay parameters α {0.99, 0.999, 0.9999, 1}, and momentum β {0, 0.9}. |