Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DFML: Decentralized Federated Mutual Learning

Authors: Yasser H. Khalil, Amir Hossein Estiri, Mahdi Beitollahi, Nader Asadi, Sobhan Hemati, Xu Li, Guojun Zhang, Xi Chen

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate consistent effectiveness of DFML in both convergence speed and global accuracy, outperforming prevalent baselines under various conditions. For example, with the CIFAR-100 dataset and 50 clients, DFML achieves a substantial increase of +17.20% and +19.95% in global accuracy under Independent and Identically Distributed (IID) and non-IID data shifts, respectively.
Researcher Affiliation	Industry	Yasser H. Khalil EMAIL Huawei Noah s Ark Lab, Montreal, Canada. Amir H. Estiri EMAIL Huawei Noah s Ark Lab, Montreal, Canada. Mahdi Beitollahi EMAIL Huawei Noah s Ark Lab, Montreal, Canada. Nader Asadi EMAIL Huawei Noah s Ark Lab, Montreal, Canada. Sobhan Hemati EMAIL Huawei Noah s Ark Lab, Montreal, Canada. Xu Li EMAIL Huawei Technologies Canada Inc., Ottawa, Canada. Guojun Zhang EMAIL Huawei Noah s Ark Lab, Montreal, Canada. Xi Chen EMAIL Huawei Noah s Ark Lab, Montreal, Canada.
Pseudocode	Yes	Algorithm 1 DFML
Open Source Code	No	The paper does not provide an explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	We evaluate our proposed DFML against prevalent baselines using five datasets including CIFAR-10/100, FMNIST, Caltech101, Oxford Pets, and Stanford Cars.
Dataset Splits	Yes	Each split is further segmented into training and validation sets following an 80:20 ratio. For Caltech101, samples are first split 80:20, where the 20% represents the global test set, and the remaining samples follows the defined splitting strategy above.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running its experiments.
Software Dependencies	No	The paper mentions software components like "SGD optimizer" and "cosine annealing" but does not provide specific version numbers for any key software libraries or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We utilize SGD optimizer for each client with momentum 0.9 and weight decay 5e-4. The learning rate is selected from {0.1, 0.01, 0.001}. The batch size is set to 8 for the Efficient Net experiments, 16 for the Res Net experiments using Caltech101, Oxford Pets, and Stanford Cars datasets, and batch size of 64 is used for all other experiments,. For the cyclic α scheduler, we apply cosine annealing. The initial oscillating period is set to 10 and is incrementally increased after each completion. α is oscillated from 0 to a maximum value selected between {0.8, 0.9, 1.0}. Figure 11 illustrates an example of the behavior of α throughout training. The number of mutual learning epochs K, performed at the aggregator, is set to 10. Moreover, the temperature is configured to 1. All experiments are repeated for 3 trials with random seeds.