reproducibilityindex.ai

Consensus Control for Decentralized Deep Learning

Authors: Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian Stich

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in earlier works reveal that, even in a data-center setup, decentralized training often suffers from the degradation in the quality of the model: the training and test performance of models trained in a decentralized fashion is in general worse than that of models trained in a centralized fashion... We empirically validate that the relation between generalization performance and consensus distance is consistent with this theoretical observation. Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop. To this end, we provide practical training guidelines and exemplify its effectiveness on the data-center setup as the important ﬁrst step.
Researcher Affiliation	Academia	1EPFL, Lausanne, Switzerland.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	No	The paper does not include any unambiguous statement about releasing source code for the methodology described, nor does it provide a direct link to a source-code repository.
Open Datasets	Yes	Image Classiﬁcation for CIFAR-10 (Krizhevsky & Hinton, 2009) and Image Net-32 (i.e. image resolution of 32) (Chrabaszcz et al., 2017); and (2) Neural Machine Translation for the Multi30k dataset (Elliott et al., 2016).
Dataset Splits	No	The paper mentions using CIFAR-10, ImageNet-32, and Multi30k datasets and refers to "standard data augmentation and preprocessing scheme". However, it does not explicitly specify exact train/validation/test split percentages, absolute sample counts for each split, or reference predefined splits with explicit citations for reproducibility.
Hardware Specification	Yes	It takes 7h to ﬁnish 1 round of standard Image Net-32 training with n = 16 V100 on a ring, and the cost increases to e.g. 12h for our consensus distance controlled experiments.
Software Dependencies	No	The paper mentions using standard optimizers (SGD, Adam) and model architectures (ResNet, Transformer) but does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	We use mini-batch SGD with a Nesterov momentum of 0.9 without dampening for image classiﬁcation task... and Adam is used for neural machine translation task. ... the models are trained for 300 and 90 epochs for CIFAR-10 and Image Net-32 respectively; the local mini-batch size are set to 32 and 64. ... The learning rate is always gradually warmed up from a relatively small value (i.e. 0.1) for the ﬁrst 5 epochs. Besides, the learning rate will be divided by 10 when the model has accessed speciﬁed fractions of the total number of training samples ({1/9} for CIFAR and Image Net respectively).