reproducibilityindex.ai

Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials

Authors: Jonathan Scott, Áine Cahill

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate empirically that this algorithm is able to successfully infer meaningful parameters. We show that using these inferred parameters to create simulated clients on the server leads to more representative training simulations. Our experiments are implemented using the pfl-research framework (Granqvist et al., 2024).
Researcher Affiliation	Collaboration	1Institute of Science and Technology Austria (ISTA) 2Apple. Correspondence to: Jonathan Scott <jonathan.scott@ist.ac.at>.
Pseudocode	Yes	Algorithm 1 Dirichlet-Multinomial Mixture Initialization Algorithm 2 Dirichlet-Multinomial Mixture MLE
Open Source Code	Yes	Our code can be found at https://github.com/apple/ pfl-research/tree/develop/publications/ mdm.
Open Datasets	Yes	We evaluate using synthetic data that follows the MDM distribution, CIFAR10 (Krizhevsky, 2009), FEMNIST (Caldas et al., 2018) and Folktables (Ding et al., 2021).
Dataset Splits	Yes	In Appendix A we outline a procedure for the server to choose the best value of K to use. ... 2. Sample a new cohort of clients that we have not yet seen and for each choice of K evaluate the log likelihood, quation 4, on this cohort of clients. 3. Use the K that gave the highest log likelihood on this validation cohort of clients.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. It mentions 'on device training' and 'mobile devices' in a general context but not for the experimental setup.
Software Dependencies	No	The paper states 'Our experiments are implemented using the pfl-research framework (Granqvist et al., 2024)', but it does not specify version numbers for this framework or any other software dependencies like Python, PyTorch, TensorFlow, or specific libraries.
Experiment Setup	Yes	For CIFAR10 we vary the local batch size over [10, 15, 20, 25], the local number of epochs over [1, 2, 5, 10] and the local learning rate over [0.005, 0.01, 0.05, 0.1, 0.5]. ... Global learning rate for Fed Avg is 1.0, client cohort size is 50, and the number of global training rounds is 1500. For FEMNIST we vary the local number of epochs over [1, 2, 5, 10] and the local learning rate over [0.005, 0.01, 0.05]. ... Global learning rate for Fed Avg is 1.0, client cohort size is 50, the number of global training rounds is 1500 and the local batch size is 10.