Distributed Second Order Methods with Fast Rates and Compressed Communication

Authors: Rustem Islamov, Xun Qian, Peter Richtarik

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results are supported with experimental results on real datasets, and show several orders of magnitude improvement on baseline and state-of-the-art methods in terms of communication complexity. (Abstract) and We now study the empirical performance of our second order methods NL1, NL2 and CNL, and compare them with relevant benchmarks and with state-of-the-art methods. (Section 4, Experiments)
Researcher Affiliation Academia 1King Abdullah University of Science and Technology, Thuwal, Saudi Arabia 2Moscow Institute of Physics and Technology, Dolgoprudny, Russia.
Pseudocode Yes Algorithm 1 NL1: NEWTON-LEARN (λ > 0 case) and Algorithm 2 NL2: NEWTON-LEARN (general case)
Open Source Code No The paper does not provide any statement or link indicating that source code for the described methodology is publicly available.
Open Datasets Yes In our experiments we use four standard datasets from the LIBSVM library: a2a, a7a, a9a, and w8a.
Dataset Splits No The paper mentions 'training data sets' and uses LIBSVM datasets but does not explicitly provide information on how the data was split into training, validation, and test sets, nor does it refer to predefined splits for these datasets.
Hardware Specification No The paper mentions 'modern computing architectures' and 'distributed computing' generally but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers. It mentions methods like BFGS and DIANA, which imply certain software environments, but no explicit versions are listed.
Experiment Setup Yes Parameter setting. In our experiments, we use the theoretical parameters (e.g., stepsizes) for all the three algorithms... We set the same constants in DINGO (Crane and Roosta, 2019) as they did: θ = 10^-4, φ = 10^-6, ρ = 10^-4, and use backtracking line search for DINGO to select the largest stepsize in {1, 2^-1, 2^-2, 2^-4, ..., 2^-10}. We conduct experiments for two values of the regularization parameter λ: 10^-3, 10^-4. For the a2a dataset, we set number of nodes to n = 15 and the size of local dataset to m = 151.