Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy
Authors: Wei-Ning Chen, Berivan Isik, Peter Kairouz, Albert No, Sewoong Oh, Zheng Xu
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical evaluations on the privacy-utility trade-offs for both DP-SGD (under a non-streaming setting) and DP-FTRL type (with matrix mechanisms (Denisov et al., 2022)) algorithms. We mainly compare the L2-CGSM (Algorithm 2) and sparsified Gaussian matrix factorization (Algorithm 2) with the uncompressed Gaussian mechanism (Balle & Wang, 2018). We convert the R enyi DP bounds to (ε, δ)-DP via the conversion lemma from Canonne et al. (2020) for a fair comparison. Datasets and models. We run experiments on the full Federated EMNIST (Cohen et al., 2017) and Stack Overflow (Authors., 2019) dataset. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Google 3Yonsei University 4University of Washington. |
| Pseudocode | Yes | Algorithm 1 L2-CSGM Algorithm 2 Sparsified Gaussian Matrix Factorization Algorithm 3 Sparsified Gaussian Matrix Factorization with Full Cohort Size |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about releasing the source code for the described methodology. |
| Open Datasets | Yes | We run experiments on the full Federated EMNIST (Cohen et al., 2017) and Stack Overflow (Authors., 2019) dataset. |
| Dataset Splits | No | The paper mentions using F-EMNIST and Stack Overflow datasets, and describes cohort sizes and epochs, but does not provide specific train/validation/test dataset split percentages or counts for reproducibility. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU/CPU models, memory, or cloud computing instances used for the experiments. |
| Software Dependencies | No | The paper mentions using SGD and LSTM models, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or library versions). |
| Experiment Setup | Yes | On F-EMNIST, we experiment with a (4 layer) Convolutional Neural Network (CNN)... On SONWP, we experiment with a 4 million parameters (4 layer) long-short term memory (LSTM) model... In both cases, clients train for 1 local epoch using SGD. Only the server uses momentum... For each local model update, we perform random rotation and L -clipping, with = 2 p 2 log(d n)/d... We use the same optimal factorization as in Denisov et al. (2022) with T = 32 for 16 epochs... We observe that for the matrix mechanism, the compression rates are generally less than DP-Fed Avg, and the performance is more sensitive to server learning rates and L2 clip norms. |