reproducibilityindex.ai

Deep learning with Elastic Averaging SGD

Authors: Sixin Zhang, Anna E. Choromanska, Yann LeCun

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that the new algorithm accelerates the training of deep architectures compared to DOWNPOUR and other common baseline approaches and furthermore is very communication efﬁcient.
Researcher Affiliation	Collaboration	Sixin Zhang Courant Institute, NYU zsx@cims.nyu.edu Anna Choromanska Courant Institute, NYU achoroma@cims.nyu.edu Yann Le Cun Center for Data Science, NYU & Facebook AI Research yann@cims.nyu.edu
Pseudocode	Yes	Algorithm 1: Asynchronous EASGD: Processing by worker i and the master
Open Source Code	Yes	Our implementation is available at https://github.com/sixin-zh/mpi T.
Open Datasets	Yes	We perform experiments in a deep learning setting on two benchmark datasets: CIFAR-10 (we refer to it as CIFAR) 4 and Image Net ILSVRC 2013 (we refer to it as Image Net) 5. (...) 4Downloaded from http://www.cs.toronto.edu/ kriz/cifar.html. 5Downloaded from http://image-net.org/challenges/LSVRC/2013.
Dataset Splits	No	The paper mentions using CIFAR and ImageNet datasets but does not explicitly specify the training/validation/test dataset splits (e.g., percentages or counts) within the main text. While these datasets have standard splits, the paper does not state how it utilized them for training, validation, and testing.
Hardware Specification	Yes	For all our experiments we use a GPU-cluster interconnected with Inﬁni Band. Each node has 4 Titan GPU processors where each local worker corresponds to one GPU processor.
Software Dependencies	No	The paper does not explicitly provide a reproducible description of ancillary software with specific version numbers. It mentions deep learning frameworks but without specific versions for its own implementation.
Experiment Setup	Yes	We add l2-regularization λ 2 x 2 to the loss function F(x). For Image Net we use λ = 10 5 and for CIFAR we use λ = 10 4. We also compute the stochastic gradient using mini-batches of sample size 128. (...) For all experiments in this section we use EASGD with β = 0.98 , for all momentum-based methods we set the momentum term δ = 0.99 and ﬁnally for MVADOWNPOUR we set the moving rate to α = 0.001.