GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

Authors: Shusen Wang, Fred Roosta, Peng Xu, Michael W. Mahoney

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct large-scale experiments on a computer cluster and, empirically, demonstrate the superior performance of GIANT.
Researcher Affiliation Academia Shusen Wang Stevens Institute of Technology shusen.wang@stevens.edu Farbod Roosta-Khorasani University of Queensland fred.roosta@uq.edu.au Peng Xu Stanford University pengxu@stanford.edu Michael W. Mahoney University of California at Berkeley mmahoney@stat.berkeley.edu
Pseudocode No The paper includes a diagram (Figure 1) illustrating an iteration of GIANT, but it does not provide formal pseudocode or an algorithm block.
Open Source Code Yes The Apache Spark code is available at https://github.com/ wangshusen/Spark Giant.git.
Open Datasets Yes We use three binary classification datasets: MNIST8M (digit 4 versus 9 , thus n = 2M and d = 784), Covtype (n = 581K and d = 54), and Epsilon (n = 500K and d = 2K), which are available at the LIBSVM website.
Dataset Splits No The paper states 'We randomly hold 80% for training and the rest for test.' but does not explicitly mention a separate validation set or its split details.
Hardware Specification Yes We conduct large-scale experiments on the Cori Supercomputer maintained by NERSC, a Cray XC40 system with 1632 compute nodes, each of which has two 2.3GHz 16-core Haswell processors and 128GB of DRAM. We use up to 375 nodes (12,000 CPU cores).
Software Dependencies No We implement GIANT, Accelerated Gradient Descent (AGD) [23], Limited memory BFGS (LBFGS) [12], and Distributed Approximate NEwton (DANE) [36] in Scala and Apache Spark [44]. While software is named, specific version numbers are not provided for Scala or Apache Spark.
Experiment Setup Yes Our theory requires the local sample size s = n m to be larger than d. But in practice, GIANT converges even if s is smaller than d. In this set of experiments, we set m = 89, and thus s is about half of d.