reproducibilityindex.ai

DINO: Distributed Newton-Type Optimization Method

Authors: Rixon Crane, Fred Roosta

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we examine the empirical performance of DINO in comparison to the, previously discussed, distributed second-order methods DINGO, Di SCO, GIANT, Inexact DANE and AIDE. We also compare these to synchronous SGD (Chen et al., 2016). In all experiments, we consider (1) with (2), where S1, . . . , Sm partition {1, . . . , n} with each having equal size n/m. In Table 3 and Figures 1 and 2, we compare performance on the strongly convex problem of softmax cross-entropy minimization with regularization on the EMNIST Digits dataset. In Figure 3, we consider the non-convex problem of non-linear least-squares without regularization on the CIFAR10 dataset... (Section 4, Experiments)
Researcher Affiliation	Academia	1School of Mathematics and Physics, University of Queensland, Australia 2International Computer Science Institute, Berkeley, CA, USA. Correspondence to: Rixon Crane <r.crane@uq.edu.au>.
Pseudocode	Yes	Algorithm 1 DINO
Open Source Code	Yes	Code is available at https://github.com/Rixon C/DINO.
Open Datasets	Yes	In Table 3 and Figures 1 and 2, we compare performance on the strongly convex problem of softmax cross-entropy minimization with regularization on the EMNIST Digits dataset. In Figure 3, we consider the non-convex problem of non-linear least-squares without regularization on the CIFAR10 dataset...
Dataset Splits	No	The paper mentions using EMNIST Digits and CIFAR10 datasets and presents 'Test Classiﬁcation Accuracy' and 'Loss on Test Set', implying a test split. However, it does not provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) for reproducibility.
Hardware Specification	No	We also run them over a distributed environment comprised of six Amazon Elastic Compute Cloud instances via Amazon Web Services (AWS). These instances are located in Ireland, Ohio, Oregon, Singapore, Sydney and Tokyo. (Section 4, Experiments). The paper mentions AWS instances and a local compute cluster, but it does not provide specific hardware details like GPU/CPU models, types, or memory specifications.
Software Dependencies	No	The paper mentions software like Py Torch and Tensorﬂow in the introduction and refers to algorithms like LSMR, CG, and SVRG being used. However, it does not provide specific version numbers for any software libraries or dependencies used in the experiments.
Experiment Setup	Yes	For DINO, and DINGO as in (Crane & Roosta, 2019), we use the hyper-parameters θ = 10 4 and φ = 10 6. For DINO, DINGO and GIANT we use distributed backtracking linesearch to select the largest step-size in {1, 2 1, . . . , 2 50} that passes, with an Armijo line-search parameter of 10 4. For Inexact DANE, we set the hyper-parameters η = 1 and µ = 0, as in (Reddi et al., 2016), which gave high performance in (Shamir et al., 2014). We also use the sub-problem solver SVRG (Johnson & Zhang, 2013) and report the best learning rate from {10 5, . . . , 105}. We let AIDE call only one iteration of Inexact DANE, which has the same parameters as the stand-alone Inexact DANE algorithm. We also report the best acceleration parameter, τ in (Reddi et al., 2016), from {10 5, . . . , 105}. For SGD, we report the best learning rate from {10 5, . . . , 105} and at each iteration all workers compute their gradient on a mini-batch of n/(5m) data points.