Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Federated Learning Based on Dynamic Regularization
Authors: Durmus Alp Emre Acar, Yue Zhao, Ramon Matas, Matthew Mattina, Paul Whatmough, Venkatesh Saligrama
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments on both visual and language real-world datasets including MNIST, EMNIST, CIFAR-10, CIFAR-100 and Shakespeare. We tabulate performance studying cases that are reflective of FL scenarios...Our goal in this section is to evaluate Fed Dyn against competing methods on benchmark datasets for various FL scenarios. |
| Researcher Affiliation | Collaboration | Durmus Alp Emre Acar EMAIL Yue Zhao EMAIL Ramon Matas Navarro EMAIL Matthew Mattina EMAIL Paul N. Whatmough EMAIL Venkatesh Saligrama EMAIL Boston University, Boston, MA Arm ML Research Lab, Boston, MA |
| Pseudocode | Yes | Algorithm 1: Federated Dynamic Regularizer (Fed Dyn) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | Datasets. We used benchmark datasets with the same train/test splits as in previous works (Mc Mahan et al., 2017; Li et al., 2020a) which are MNIST (Le Cun et al., 1998), CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), a subset of EMNIST (Cohen et al., 2017) (EMNIST-L), Shakespeare (Shakespeare, 1994) as well as a synthetic dataset. |
| Dataset Splits | No | The paper states 'We use the usual train and test splits for MNIST, EMNIST-L, CIFAR-10 and CIFAR-100' and summarizes 'The number of training and test samples of the benchmark datasets are summarized in Table 3'. However, it does not explicitly mention details about a validation set split (percentages, counts, or how it's used). |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments, such as GPU models, CPU types, or memory specifications. Mentions of 'mobile and IoT devices' refer to the application context, not the experimental setup. |
| Software Dependencies | No | The paper mentions general tools or approaches like 'SGD procedure' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or library versions) that would be needed for replication. |
| Experiment Setup | Yes | We consider different hyperparameter configurations for different setups and datasets. For all the experiments, we fix batch size as 50 for MNIST, CIFAR-10, CIFAR-100 and EMNIST-L datasets and as 100 for Shakespeare dataset. We test learning rates in [1, .1] and epochs in [1, 10, 50] for all three algorithms. α parameter of Fed Dyn is chosen among [.1, .01, .001]; K parameter of SCAFFOLD is searched in [20, 200, 1000]... and µ regularization hyperparameter of Fed Prox in [0.01, .0001]. The same hyperparameters are applied to all the CIFAR-10 experiments, including: 0.1 for learning rate, 5 for epochs, and 10-3 for weight decay. |