Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Homogenizing Non-IID Datasets via In-Distribution Knowledge Distillation for Decentralized Learning

Authors: Deepak Ravikumar, Gobinda Saha, Sai Aparna Aketi, Kaushik Roy

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.1... We evaluate the performance of the proposed IDKD methodology on various datasets and graph topologies and compare it against the current state-of-the-art decentralized learning algorithm. We chose QG-DSGDm N (Lin et al., 2021) as our primary baseline as it achieves state-of-the-art performance on heterogeneous data
Researcher Affiliation	Academia	Deepak Ravikumar EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA Gobinda Saha EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA Sai Aparna Aketi EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA Kaushik Roy EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA
Pseudocode	Yes	Algorithm 1 IDKD framework at each node
Open Source Code	Yes	Code available at https://github.com/DeepakTatachar/IDKD
Open Datasets	Yes	For our experiments, we use Res Net (He et al., 2016) architecture on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Imagenette (Howard, 2018) datasets. We choose Tiny Image Net (Li et al., 2015), LSUN (Yu et al., 2015), and Uniform-Noise as the distillation datasets (public dataset).
Dataset Splits	Yes	For all the experiments, we used a 10% train-validation split to identify the hyperparameters. Once the hyperparameters were tuned on the validation set, we re-ran the experiment on the complete training set with the identified hyperparameters. Details of the dataset sizes are reported in Table 9.
Hardware Specification	Yes	For all the experiments we used a cluster of 8 nodes. 4 of these nodes were equipped with an Intel(R) Xeon(R) Silver 4114 CPU with 93 GB of usable system memory and 3 NVIDIA Ge Force GTX 1080 Ti. The other 4 nodes were equipped with an Intel(R) Xeon(R) Silver 4114 CPU with 187GB GB of usable main memory and 4 NVIDIA Ge Force GTX 2080 Ti.
Software Dependencies	Yes	For communication, we use MPI (Gropp et al., 2003) implementation from mpich3.2, for parallel decentralized learning. We used Pytorch framework (Paszke et al., 2019) for automatic differentiation of deep learning models.
Experiment Setup	Yes	The hyperparameters used for training the models are presented in Table 10. The hyperparameters were tuned using a validation set (10% of the train set). The final accuracy was reported by using these hyperparameters trained on the complete train set. The learning rate is denoted by LR, momentum by β, batch size by BS, number of nodes in the graph by N and consensus step size γ.