Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Homogenizing Non-IID Datasets via In-Distribution Knowledge Distillation for Decentralized Learning
Authors: Deepak Ravikumar, Gobinda Saha, Sai Aparna Aketi, Kaushik Roy
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.1... We evaluate the performance of the proposed IDKD methodology on various datasets and graph topologies and compare it against the current state-of-the-art decentralized learning algorithm. We chose QG-DSGDm N (Lin et al., 2021) as our primary baseline as it achieves state-of-the-art performance on heterogeneous data |
| Researcher Affiliation | Academia | Deepak Ravikumar EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA Gobinda Saha EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA Sai Aparna Aketi EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA Kaushik Roy EMAIL Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA |
| Pseudocode | Yes | Algorithm 1 IDKD framework at each node |
| Open Source Code | Yes | Code available at https://github.com/DeepakTatachar/IDKD |
| Open Datasets | Yes | For our experiments, we use Res Net (He et al., 2016) architecture on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Imagenette (Howard, 2018) datasets. We choose Tiny Image Net (Li et al., 2015), LSUN (Yu et al., 2015), and Uniform-Noise as the distillation datasets (public dataset). |
| Dataset Splits | Yes | For all the experiments, we used a 10% train-validation split to identify the hyperparameters. Once the hyperparameters were tuned on the validation set, we re-ran the experiment on the complete training set with the identified hyperparameters. Details of the dataset sizes are reported in Table 9. |
| Hardware Specification | Yes | For all the experiments we used a cluster of 8 nodes. 4 of these nodes were equipped with an Intel(R) Xeon(R) Silver 4114 CPU with 93 GB of usable system memory and 3 NVIDIA Ge Force GTX 1080 Ti. The other 4 nodes were equipped with an Intel(R) Xeon(R) Silver 4114 CPU with 187GB GB of usable main memory and 4 NVIDIA Ge Force GTX 2080 Ti. |
| Software Dependencies | Yes | For communication, we use MPI (Gropp et al., 2003) implementation from mpich3.2, for parallel decentralized learning. We used Pytorch framework (Paszke et al., 2019) for automatic differentiation of deep learning models. |
| Experiment Setup | Yes | The hyperparameters used for training the models are presented in Table 10. The hyperparameters were tuned using a validation set (10% of the train set). The final accuracy was reported by using these hyperparameters trained on the complete train set. The learning rate is denoted by LR, momentum by β, batch size by BS, number of nodes in the graph by N and consensus step size γ. |