(Amplified) Banded Matrix Factorization: A unified approach to private training
Authors: Christopher A. Choquette-Choo, Arun Ganesh, Ryan McKenna, H. Brendan McMahan, John Rush, Abhradeep Guha Thakurta, Zheng Xu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on example-level DP for image classification (of CIFAR10) and user-level DP for next word prediction (NWP) (of Stack Overflow NWP) focus on comparing our BANDMF with the existing state-of-the-art MULTI-EPOCH MF [15] and DP-SGD [1]. We train for 20 epochs on CIFAR-10, and tune all mechanisms to achieve their best performance for each ϵ, using 12 repeated runs. |
| Researcher Affiliation | Industry | Christopher A. Choquette-Choo Google Deep Mind cchoquette@google.com Arun Ganesh Google Research arunganesh@google.com Ryan Mc Kenna Google Research mckennar@google.com H. Brendan Mc Mahan Google Research mcmahan@google.com Keith Rush Google Research krush@google.com Abhradeep Thakurta Google Deep Mind athakurta@google.com Zheng Xu Google Research xuzheng@google.com |
| Pseudocode | Yes | Algorithm 1 MF-DP-FTRL and DP-SGD; Algorithm 2 Sampling scheme; Algorithm 3 (VECSENS): Maximum of v,u where u is a vector in the ℓ unit ball satisfying Πb.; Algorithm 4 Efficient sensitivity upper bound for b-min-sep-participation; Algorithm 5 Efficient sensitivity calculation for b-min-sep-participation, assuming X is b-banded.; Algorithm 8 Banded Matrix Multiplication; Algorithm 9 Banded Inverse Multiplication |
| Open Source Code | No | We will release all code with the final manuscript. |
| Open Datasets | Yes | Our experiments on example-level DP for image classification (of CIFAR10) and user-level DP for next word prediction (NWP) (of Stack Overflow NWP) focus on comparing our BANDMF with the existing state-of-the-art MULTI-EPOCH MF [15] and DP-SGD [1]...We next consider the now-standard Stack Overflow next-word-prediction (NWP) task with user-level differential privacy, again following [15] (full details in App. I)...We fine-tune a Spanish next word prediction model, pretrained on the multilingual C4 dataset [49, 60], with on-device user data using FL. |
| Dataset Splits | Yes | Validation accuracy, smoothed (%) (Table 6); Validation accuracy smoothed over the final 400 rounds of training, used to select the best server learning rates for the comparison of test-set accuracy presented in Fig. 6[a]. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for its experiments. |
| Software Dependencies | No | dp_accounting library [18] is mentioned. TensorFlow Federated and TensorFlow Privacy are mentioned as future open-source components. No specific version numbers for these are provided in the main text in the context of experiment execution. |
| Experiment Setup | Yes | Our full experimental setup is described in App. H, and closely follows prior work [15]. We train for 20 epochs on CIFAR-10, and tune all mechanisms to achieve their best performance for each ϵ, using 12 repeated runs...We tune all jobs on a learning rate grid of coefficients in {1, 2, 5} on powers in [-2, 3]...a learning rate cooldown to 0.05 the initial learning rate over the last 500 steps of training...2052 steps and 6 epochs, with B = 1000...For all SO NWP experiments we use the Fed SGDM optimizer [50]...clipping to ζ = 1...SGDM with momentum parameter β = 0.95 and learning rate ηs...linear learning rate warmup from 0.05ηs to 1.0ηs over the first 15% of rounds (309), and a linear decay from 1.0ηs to 0.05ηs over the last 25% of rounds (513). |