Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Homogenizing Non-IID Datasets via In-Distribution Knowledge Distillation for Decentralized Learning

Authors: Deepak Ravikumar, Gobinda Saha, Sai Aparna Aketi, Kaushik Roy

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.1... We evaluate the performance of the proposed IDKD methodology on various datasets and graph topologies and compare it against the current state-of-the-art decentralized learning algorithm. We chose QG-DSGDm N (Lin et al., 2021) as our primary baseline as it achieves state-of-the-art performance on heterogeneous data
Researcher Affiliation Academia Deepak Ravikumar EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA Gobinda Saha EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA Sai Aparna Aketi EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA Kaushik Roy EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA
Pseudocode Yes Algorithm 1 IDKD framework at each node
Open Source Code Yes Code available at https://github.com/DeepakTatachar/IDKD
Open Datasets Yes For our experiments, we use Res Net (He et al., 2016) architecture on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Imagenette (Howard, 2018) datasets. We choose Tiny Image Net (Li et al., 2015), LSUN (Yu et al., 2015), and Uniform-Noise as the distillation datasets (public dataset).
Dataset Splits Yes For all the experiments, we used a 10% train-validation split to identify the hyperparameters. Once the hyperparameters were tuned on the validation set, we re-ran the experiment on the complete training set with the identified hyperparameters. Details of the dataset sizes are reported in Table 9.
Hardware Specification Yes For all the experiments we used a cluster of 8 nodes. 4 of these nodes were equipped with an Intel(R) Xeon(R) Silver 4114 CPU with 93 GB of usable system memory and 3 NVIDIA Ge Force GTX 1080 Ti. The other 4 nodes were equipped with an Intel(R) Xeon(R) Silver 4114 CPU with 187GB GB of usable main memory and 4 NVIDIA Ge Force GTX 2080 Ti.
Software Dependencies Yes For communication, we use MPI (Gropp et al., 2003) implementation from mpich3.2, for parallel decentralized learning. We used Pytorch framework (Paszke et al., 2019) for automatic differentiation of deep learning models.
Experiment Setup Yes The hyperparameters used for training the models are presented in Table 10. The hyperparameters were tuned using a validation set (10% of the train set). The final accuracy was reported by using these hyperparameters trained on the complete train set. The learning rate is denoted by LR, momentum by β, batch size by BS, number of nodes in the graph by N and consensus step size γ.