Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning

Authors: Christopher A. Choquette-Choo, Hugh Brendan Mcmahan, J Keith Rush, Abhradeep Guha Thakurta

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical evaluation on both examplelevel DP for image classification and user-level DP for language modeling demonstrate substantial improvements over all previous methods, including the widely-used DP-SGD.
Researcher Affiliation Industry 1Google Research. Correspondence to: <{cchoquette,krush,mcmahan,athakurta}@google.com>.
Pseudocode Yes Algorithm 1 DP-Prefix Sum Computation via FFT (with d = 1)
Open Source Code Yes Our code is at: https://github.com/google-research/ federated/tree/master/multi_epoch_dp_ matrix_factorization.
Open Datasets Yes We train image classification models on CIFAR10 (Krizhevsky, 2009)... and We use the standard benchmark: Stack Overflow next-word prediction (Reddi et al., 2020).
Dataset Splits Yes We train image-classification models using the CIFAR10 dataset as hosted in tensorflow-datasets, containing 50,000 training and 10,000 test examples.
Hardware Specification No The paper mentions "V100 GPU" in the context of computational cost for a specific component (optimal FFT decoder) but does not provide specific hardware details for running its main experiments.
Software Dependencies No The paper mentions "tensorflow-datasets" and "NumPy" (in Appendix K), but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes Models trained for 20 epochs on CIFAR10 with a batch size of 500. We sweep over learning rates of values (1 10i, 2 10i, 5 10i) for i in { 2, 1}; We sweep over momentum values of 0, 0.85, 0.9, 0.95.