GIANT: Globally Improved Approximate Newton Method for Distributed Optimization
Authors: Shusen Wang, Fred Roosta, Peng Xu, Michael W. Mahoney
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct large-scale experiments on a computer cluster and, empirically, demonstrate the superior performance of GIANT. |
| Researcher Affiliation | Academia | Shusen Wang Stevens Institute of Technology shusen.wang@stevens.edu Farbod Roosta-Khorasani University of Queensland fred.roosta@uq.edu.au Peng Xu Stanford University pengxu@stanford.edu Michael W. Mahoney University of California at Berkeley mmahoney@stat.berkeley.edu |
| Pseudocode | No | The paper includes a diagram (Figure 1) illustrating an iteration of GIANT, but it does not provide formal pseudocode or an algorithm block. |
| Open Source Code | Yes | The Apache Spark code is available at https://github.com/ wangshusen/Spark Giant.git. |
| Open Datasets | Yes | We use three binary classification datasets: MNIST8M (digit 4 versus 9 , thus n = 2M and d = 784), Covtype (n = 581K and d = 54), and Epsilon (n = 500K and d = 2K), which are available at the LIBSVM website. |
| Dataset Splits | No | The paper states 'We randomly hold 80% for training and the rest for test.' but does not explicitly mention a separate validation set or its split details. |
| Hardware Specification | Yes | We conduct large-scale experiments on the Cori Supercomputer maintained by NERSC, a Cray XC40 system with 1632 compute nodes, each of which has two 2.3GHz 16-core Haswell processors and 128GB of DRAM. We use up to 375 nodes (12,000 CPU cores). |
| Software Dependencies | No | We implement GIANT, Accelerated Gradient Descent (AGD) [23], Limited memory BFGS (LBFGS) [12], and Distributed Approximate NEwton (DANE) [36] in Scala and Apache Spark [44]. While software is named, specific version numbers are not provided for Scala or Apache Spark. |
| Experiment Setup | Yes | Our theory requires the local sample size s = n m to be larger than d. But in practice, GIANT converges even if s is smaller than d. In this set of experiments, we set m = 89, and thus s is about half of d. |