reproducibilityindex.ai

Decentralized Deep Learning with Arbitrary Communication Compression

Authors: Anastasia Koloskova*, Tao Lin*, Sebastian U Stich, Martin Jaggi

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the practical performance of the algorithm in two key scenarios: the training of deep learning models (i) over decentralized user devices, connected by a peer-to-peer network and (ii) in a datacenter.
Researcher Affiliation	Academia	Anastasia Koloskova anastasia.koloskova@epfl.ch Tao Lin tao.lin@epfl.ch Sebastian U. Stich sebastian.stich@epfl.ch Martin Jaggi martin.jaggi@epfl.ch EPFL Lausanne, Switzerland
Pseudocode	Yes	Algorithm 1 CHOCO-SGD (Koloskova et al., 2019) ... Algorithm 2 CHOCO-SGD with Momentum
Open Source Code	Yes	Our implementations are open-source and available at https://github.com/epfml/Choco SGD.
Open Datasets	Yes	Cifar10 dataset (50K/10K training/test samples) (Krizhevsky, 2012)... Image Net-1k (1.28M/50K training/validation) (Deng et al., 2009)... Wiki Text-2 (600 training and 60 validation articles with a total of 2 088 628 and 217 646 tokens respectively) (Merity et al., 2016).
Dataset Splits	Yes	Cifar10 dataset (50K/10K training/test samples)... Image Net-1k (1.28M/50K training/validation)... Wiki Text-2 (600 training and 60 validation articles with a total of 2 088 628 and 217 646 tokens respectively)
Hardware Specification	Yes	We perform our experiments on 8 machines (n1-standard-32 from Google Cloud with Intel Ivy Bridge CPU platform), where each of machines has 4 Tesla P100 GPUs and each machine interconnected via 10Gbps Ethernet.
Software Dependencies	No	The paper mentions common software and libraries used in deep learning such as PyTorch and ResNet, but does not provide specific version numbers for any key software components or dependencies required for reproducibility.
Experiment Setup	Yes	For all algorithms we ﬁne-tune the initial learning rate and gradually warm it up from a relative small value (0.1) (Goyal et al., 2017) for the ﬁrst 5 epochs. The learning rate is decayed by 10 twice, at 150 and 225 epochs, and stop training at 300 epochs. For CHOCO-SGD and Deep Squeeze the consensus learning rate γ is also tuned. The detailed hyper-parameter tuning procedure refers to Appendix F. ... Table 4 demonstrates the ﬁne-tuned hpyerparameters of CHOCO-SGD for training Res Net-20 on Cifar10, while Table 6 reports our ﬁne-tuned hpyerparameters of our baselines. Table 5 demonstrates the ﬁne-tuned hpyerparameters of CHOCO-SGD for training Res Net-20/LSTM on a social network topology.