Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials
Authors: Jonathan Scott, Γine Cahill
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that this algorithm is able to successfully infer meaningful parameters. We show that using these inferred parameters to create simulated clients on the server leads to more representative training simulations. Our experiments are implemented using the pfl-research framework (Granqvist et al., 2024). |
| Researcher Affiliation | Collaboration | 1Institute of Science and Technology Austria (ISTA) 2Apple. Correspondence to: Jonathan Scott <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Dirichlet-Multinomial Mixture Initialization Algorithm 2 Dirichlet-Multinomial Mixture MLE |
| Open Source Code | Yes | Our code can be found at https://github.com/apple/ pfl-research/tree/develop/publications/ mdm. |
| Open Datasets | Yes | We evaluate using synthetic data that follows the MDM distribution, CIFAR10 (Krizhevsky, 2009), FEMNIST (Caldas et al., 2018) and Folktables (Ding et al., 2021). |
| Dataset Splits | Yes | In Appendix A we outline a procedure for the server to choose the best value of K to use. ... 2. Sample a new cohort of clients that we have not yet seen and for each choice of K evaluate the log likelihood, quation 4, on this cohort of clients. 3. Use the K that gave the highest log likelihood on this validation cohort of clients. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. It mentions 'on device training' and 'mobile devices' in a general context but not for the experimental setup. |
| Software Dependencies | No | The paper states 'Our experiments are implemented using the pfl-research framework (Granqvist et al., 2024)', but it does not specify version numbers for this framework or any other software dependencies like Python, PyTorch, TensorFlow, or specific libraries. |
| Experiment Setup | Yes | For CIFAR10 we vary the local batch size over [10, 15, 20, 25], the local number of epochs over [1, 2, 5, 10] and the local learning rate over [0.005, 0.01, 0.05, 0.1, 0.5]. ... Global learning rate for Fed Avg is 1.0, client cohort size is 50, and the number of global training rounds is 1500. For FEMNIST we vary the local number of epochs over [1, 2, 5, 10] and the local learning rate over [0.005, 0.01, 0.05]. ... Global learning rate for Fed Avg is 1.0, client cohort size is 50, the number of global training rounds is 1500 and the local batch size is 10. |