DINO: Distributed Newton-Type Optimization Method
Authors: Rixon Crane, Fred Roosta
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we examine the empirical performance of DINO in comparison to the, previously discussed, distributed second-order methods DINGO, Di SCO, GIANT, Inexact DANE and AIDE. We also compare these to synchronous SGD (Chen et al., 2016). In all experiments, we consider (1) with (2), where S1, . . . , Sm partition {1, . . . , n} with each having equal size n/m. In Table 3 and Figures 1 and 2, we compare performance on the strongly convex problem of softmax cross-entropy minimization with regularization on the EMNIST Digits dataset. In Figure 3, we consider the non-convex problem of non-linear least-squares without regularization on the CIFAR10 dataset... (Section 4, Experiments) |
| Researcher Affiliation | Academia | 1School of Mathematics and Physics, University of Queensland, Australia 2International Computer Science Institute, Berkeley, CA, USA. Correspondence to: Rixon Crane <r.crane@uq.edu.au>. |
| Pseudocode | Yes | Algorithm 1 DINO |
| Open Source Code | Yes | Code is available at https://github.com/Rixon C/DINO. |
| Open Datasets | Yes | In Table 3 and Figures 1 and 2, we compare performance on the strongly convex problem of softmax cross-entropy minimization with regularization on the EMNIST Digits dataset. In Figure 3, we consider the non-convex problem of non-linear least-squares without regularization on the CIFAR10 dataset... |
| Dataset Splits | No | The paper mentions using EMNIST Digits and CIFAR10 datasets and presents 'Test Classification Accuracy' and 'Loss on Test Set', implying a test split. However, it does not provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) for reproducibility. |
| Hardware Specification | No | We also run them over a distributed environment comprised of six Amazon Elastic Compute Cloud instances via Amazon Web Services (AWS). These instances are located in Ireland, Ohio, Oregon, Singapore, Sydney and Tokyo. (Section 4, Experiments). The paper mentions AWS instances and a local compute cluster, but it does not provide specific hardware details like GPU/CPU models, types, or memory specifications. |
| Software Dependencies | No | The paper mentions software like Py Torch and Tensorflow in the introduction and refers to algorithms like LSMR, CG, and SVRG being used. However, it does not provide specific version numbers for any software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | For DINO, and DINGO as in (Crane & Roosta, 2019), we use the hyper-parameters θ = 10 4 and φ = 10 6. For DINO, DINGO and GIANT we use distributed backtracking linesearch to select the largest step-size in {1, 2 1, . . . , 2 50} that passes, with an Armijo line-search parameter of 10 4. For Inexact DANE, we set the hyper-parameters η = 1 and µ = 0, as in (Reddi et al., 2016), which gave high performance in (Shamir et al., 2014). We also use the sub-problem solver SVRG (Johnson & Zhang, 2013) and report the best learning rate from {10 5, . . . , 105}. We let AIDE call only one iteration of Inexact DANE, which has the same parameters as the stand-alone Inexact DANE algorithm. We also report the best acceleration parameter, τ in (Reddi et al., 2016), from {10 5, . . . , 105}. For SGD, we report the best learning rate from {10 5, . . . , 105} and at each iteration all workers compute their gradient on a mini-batch of n/(5m) data points. |