A Distributed Second-Order Algorithm You Can Trust

Authors: Celestine Duenner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical rates of convergence for a wide class of problems including L1regularized objectives. We also demonstrate that our approach achieves state-of-the-art results on multiple large benchmark datasets.
Researcher Affiliation Collaboration 1IBM Research Zürich, Switzerland 2ETH, Zürich, Switzerland 3Albert-Ludwigs-Universität, Freiburg, Germany 4EPFL, Lausanne, Switzerland. Correspondence to: Celestine Dünner <cdu@zurich.ibm.com>.
Pseudocode Yes Algorithm 1 Adaptive Distributed Newton Method (ADN)
Open Source Code No The paper does not provide a link or explicit statement about the availability of source code for the proposed method. It only provides a link to the open source implementation of a baseline method (GIANT).
Open Datasets Yes We compare ADN to state-of-the-art distributed solvers on four large-scale datasets (see Table 1). Table 1. Datasets used for the experiments. url, webspam, kdda, criteo
Dataset Splits No The paper mentions using large-scale datasets for training but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). For example, it mentions 'a subsample (10 million examples) of the criteo dataset' but no split details.
Hardware Specification No The paper states 'If not stated otherwise, we use K = 8 workers' but does not provide any specific hardware details such as CPU/GPU models, memory, or network specifications for these workers.
Software Dependencies No The paper states 'All algorithms presented in this section are implemented in C++, they are optimized for sparse data structures and use MPI to handle communication between workers.' However, it does not provide specific version numbers for C++ compilers, MPI libraries, or any other software dependencies.
Experiment Setup Yes In addition to σ0 there are three more parameters in Algorithm 1 namely ζ, γ and ξ that determine how to update σt. The most natural choice for ξ is a small positive value, as we do not want to discard updates that would yield a function decrease; we therefore choose ξ = 0. The convergence of Algorithm 1 is guaranteed for any choice of ζ, γ > 1, and we found empirically that the performance is not very sensitive to the choice of these parameters and the optimal values are robust across different datasets (e.g., γ = ζ 1.2 is generally a good choice). However, to completely eliminate these parameters from the algorithm we suggest the following practical parameter-free update schedule: σt+1 := f(A(α(t)+ α)) f(Aα(t)) f(Aα(t))A α ˆf(Aα(t), A α) f(Aα(t)) f(Aα(t))A α σt.