reproducibilityindex.ai

Training Neural Networks Without Gradients: A Scalable ADMM Approach

Authors: Gavin Taylor, Ryan Burmeister, Zheng Xu, Bharat Singh, Ankit Patel, Tom Goldstein

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present experimental results that compare the performance of the ADMM method to other approaches, including SGD, conjugate gradients, and LBFGS on benchmark classiﬁcation tasks.
Researcher Affiliation	Academia	1United States Naval Academy, Annapolis, MD USA 2University of Maryland, College Park, MD USA 3Rice University, Houston, TX USA
Pseudocode	Yes	Algorithm 1 ADMM for Neural Nets
Open Source Code	No	The paper does not provide any concrete access information for the source code, such as a repository link or an explicit statement about code release.
Open Datasets	Yes	The ﬁrst is a subset of the Street View House Numbers (SVHN) dataset (Netzer et al., 2011). The second dataset is the far more difﬁcult Higgs dataset (Baldi et al., 2014)
Dataset Splits	Yes	Using the extra dataset to train, this meant 120,290 training datapoints of 648 features each. The testing set contained 5,893 data points." and "The second dataset is the far more difﬁcult Higgs dataset (Baldi et al., 2014), consisting of a training set of 10,500,000 datapoints of 28 features each... The testing set consists of 500,000 datapoints.
Hardware Specification	Yes	The new ADMM approach was implemented in Python on a Cray XC30 supercomputer with Ivy Bridge processors, and communication between cores performed via MPI. SGD, conjugate gradients, and L-BFGS are run as implemented in the Torch optim package on NVIDIA Tesla K40 GPUs.
Software Dependencies	No	The new ADMM approach was implemented in Python... SGD, conjugate gradients, and L-BFGS are run as implemented in the Torch optim package. The paper mentions software names but does not provide specific version numbers.
Experiment Setup	Yes	We choose γi = 10 and βi = 1 for all trials runs reported here... We use training data with binary class labels... We use a separable loss function with a hinge penalty...