reproducibilityindex.ai

GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

Authors: Shusen Wang, Fred Roosta, Peng Xu, Michael W. Mahoney

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct large-scale experiments on a computer cluster and, empirically, demonstrate the superior performance of GIANT.
Researcher Affiliation	Academia	Shusen Wang Stevens Institute of Technology shusen.wang@stevens.edu Farbod Roosta-Khorasani University of Queensland fred.roosta@uq.edu.au Peng Xu Stanford University pengxu@stanford.edu Michael W. Mahoney University of California at Berkeley mmahoney@stat.berkeley.edu
Pseudocode	No	The paper includes a diagram (Figure 1) illustrating an iteration of GIANT, but it does not provide formal pseudocode or an algorithm block.
Open Source Code	Yes	The Apache Spark code is available at https://github.com/ wangshusen/Spark Giant.git.
Open Datasets	Yes	We use three binary classiﬁcation datasets: MNIST8M (digit 4 versus 9 , thus n = 2M and d = 784), Covtype (n = 581K and d = 54), and Epsilon (n = 500K and d = 2K), which are available at the LIBSVM website.
Dataset Splits	No	The paper states 'We randomly hold 80% for training and the rest for test.' but does not explicitly mention a separate validation set or its split details.
Hardware Specification	Yes	We conduct large-scale experiments on the Cori Supercomputer maintained by NERSC, a Cray XC40 system with 1632 compute nodes, each of which has two 2.3GHz 16-core Haswell processors and 128GB of DRAM. We use up to 375 nodes (12,000 CPU cores).
Software Dependencies	No	We implement GIANT, Accelerated Gradient Descent (AGD) [23], Limited memory BFGS (LBFGS) [12], and Distributed Approximate NEwton (DANE) [36] in Scala and Apache Spark [44]. While software is named, specific version numbers are not provided for Scala or Apache Spark.
Experiment Setup	Yes	Our theory requires the local sample size s = n m to be larger than d. But in practice, GIANT converges even if s is smaller than d. In this set of experiments, we set m = 89, and thus s is about half of d.