reproducibilityindex.ai

A Distributed Second-Order Algorithm You Can Trust

Authors: Celestine Duenner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical rates of convergence for a wide class of problems including L1regularized objectives. We also demonstrate that our approach achieves state-of-the-art results on multiple large benchmark datasets.
Researcher Affiliation	Collaboration	1IBM Research Zürich, Switzerland 2ETH, Zürich, Switzerland 3Albert-Ludwigs-Universität, Freiburg, Germany 4EPFL, Lausanne, Switzerland. Correspondence to: Celestine Dünner <cdu@zurich.ibm.com>.
Pseudocode	Yes	Algorithm 1 Adaptive Distributed Newton Method (ADN)
Open Source Code	No	The paper does not provide a link or explicit statement about the availability of source code for the proposed method. It only provides a link to the open source implementation of a baseline method (GIANT).
Open Datasets	Yes	We compare ADN to state-of-the-art distributed solvers on four large-scale datasets (see Table 1). Table 1. Datasets used for the experiments. url, webspam, kdda, criteo
Dataset Splits	No	The paper mentions using large-scale datasets for training but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). For example, it mentions 'a subsample (10 million examples) of the criteo dataset' but no split details.
Hardware Specification	No	The paper states 'If not stated otherwise, we use K = 8 workers' but does not provide any specific hardware details such as CPU/GPU models, memory, or network specifications for these workers.
Software Dependencies	No	The paper states 'All algorithms presented in this section are implemented in C++, they are optimized for sparse data structures and use MPI to handle communication between workers.' However, it does not provide specific version numbers for C++ compilers, MPI libraries, or any other software dependencies.
Experiment Setup	Yes	In addition to σ0 there are three more parameters in Algorithm 1 namely ζ, γ and ξ that determine how to update σt. The most natural choice for ξ is a small positive value, as we do not want to discard updates that would yield a function decrease; we therefore choose ξ = 0. The convergence of Algorithm 1 is guaranteed for any choice of ζ, γ > 1, and we found empirically that the performance is not very sensitive to the choice of these parameters and the optimal values are robust across different datasets (e.g., γ = ζ 1.2 is generally a good choice). However, to completely eliminate these parameters from the algorithm we suggest the following practical parameter-free update schedule: σt+1 := f(A(α(t)+ α)) f(Aα(t)) f(Aα(t))A α ˆf(Aα(t), A α) f(Aα(t)) f(Aα(t))A α σt.