Communication-Efficient Distributed Optimization with Quantized Preconditioners

Authors: Foivos Alimisis, Peter Davies, Dan Alistarh

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also validate our findings experimentally, showing fast convergence and reduced communication.
Researcher Affiliation Collaboration 1Department of Mathematics, University of Geneva, Switzerland (work done while at IST Austria) 2IST Austria 3Neural Magic, US
Pseudocode Yes The algorithm is presented in a numbered list of steps under section 3.1 'The Algorithm' and 4.1 'Algorithm Description', formatted as structured steps for a method.
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes Dataset We use the dataset cpusmall scale from LIBSVM (Chang & Lin, 2011). ... We demonstrate the methods on the phishing and german numer datasets from the LIBSVM collection (Chang & Lin, 2011)
Dataset Splits No The paper does not provide specific details about training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using QSGD (Alistarh et al., 2016) and the Hadamard-rotation based method (Suresh et al., 2017) for gradient quantization but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes The learning rate (lr in the figure titles) is set close to the maximum for which gradient descent will converge... The number of bits per coordinate used to quantize gradients (qb) and preconditioners (pb) are also shown; the latter is an average since the quantization method uses a variable number of bits. ... we test each with learning rates in {2^0, 2^-1, 2^-2, . . . }, and plot the highest rate for which the method stably converges.