(Amplified) Banded Matrix Factorization: A unified approach to private training

Authors: Christopher A. Choquette-Choo, Arun Ganesh, Ryan McKenna, H. Brendan McMahan, John Rush, Abhradeep Guha Thakurta, Zheng Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on example-level DP for image classification (of CIFAR10) and user-level DP for next word prediction (NWP) (of Stack Overflow NWP) focus on comparing our BANDMF with the existing state-of-the-art MULTI-EPOCH MF [15] and DP-SGD [1]. We train for 20 epochs on CIFAR-10, and tune all mechanisms to achieve their best performance for each ϵ, using 12 repeated runs.
Researcher Affiliation Industry Christopher A. Choquette-Choo Google Deep Mind cchoquette@google.com Arun Ganesh Google Research arunganesh@google.com Ryan Mc Kenna Google Research mckennar@google.com H. Brendan Mc Mahan Google Research mcmahan@google.com Keith Rush Google Research krush@google.com Abhradeep Thakurta Google Deep Mind athakurta@google.com Zheng Xu Google Research xuzheng@google.com
Pseudocode Yes Algorithm 1 MF-DP-FTRL and DP-SGD; Algorithm 2 Sampling scheme; Algorithm 3 (VECSENS): Maximum of v,u where u is a vector in the ℓ unit ball satisfying Πb.; Algorithm 4 Efficient sensitivity upper bound for b-min-sep-participation; Algorithm 5 Efficient sensitivity calculation for b-min-sep-participation, assuming X is b-banded.; Algorithm 8 Banded Matrix Multiplication; Algorithm 9 Banded Inverse Multiplication
Open Source Code No We will release all code with the final manuscript.
Open Datasets Yes Our experiments on example-level DP for image classification (of CIFAR10) and user-level DP for next word prediction (NWP) (of Stack Overflow NWP) focus on comparing our BANDMF with the existing state-of-the-art MULTI-EPOCH MF [15] and DP-SGD [1]...We next consider the now-standard Stack Overflow next-word-prediction (NWP) task with user-level differential privacy, again following [15] (full details in App. I)...We fine-tune a Spanish next word prediction model, pretrained on the multilingual C4 dataset [49, 60], with on-device user data using FL.
Dataset Splits Yes Validation accuracy, smoothed (%) (Table 6); Validation accuracy smoothed over the final 400 rounds of training, used to select the best server learning rates for the comparison of test-set accuracy presented in Fig. 6[a].
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for its experiments.
Software Dependencies No dp_accounting library [18] is mentioned. TensorFlow Federated and TensorFlow Privacy are mentioned as future open-source components. No specific version numbers for these are provided in the main text in the context of experiment execution.
Experiment Setup Yes Our full experimental setup is described in App. H, and closely follows prior work [15]. We train for 20 epochs on CIFAR-10, and tune all mechanisms to achieve their best performance for each ϵ, using 12 repeated runs...We tune all jobs on a learning rate grid of coefficients in {1, 2, 5} on powers in [-2, 3]...a learning rate cooldown to 0.05 the initial learning rate over the last 500 steps of training...2052 steps and 6 epochs, with B = 1000...For all SO NWP experiments we use the Fed SGDM optimizer [50]...clipping to ζ = 1...SGDM with momentum parameter β = 0.95 and learning rate ηs...linear learning rate warmup from 0.05ηs to 1.0ηs over the first 15% of rounds (309), and a linear decay from 1.0ηs to 0.05ηs over the last 25% of rounds (513).