reproducibilityindex.ai

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

Authors: Haoyu Zhao, Boyue Li, Zhize Li, Peter Richtarik, Yuejie Chi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime.
Researcher Affiliation	Academia	Haoyu Zhao Princeton University haoyu@princeton.edu Boyue Li Carnegie Mellon University boyuel@andrew.cmu.edu Zhize Li Carnegie Mellon University zhizel@andrew.cmu.edu Peter Richtárik King Abdullah University of Science and Technology peter.richtarik@kaust.edu.sa Yuejie Chi Carnegie Mellon University yuejiec@andrew.cmu.edu
Pseudocode	Yes	Algorithm 1 BEER: BEtter compr Ession for decent Ralized optimization
Open Source Code	Yes	The code can be accessed at: https://github.com/liboyue/beer.
Open Datasets	Yes	We run experiments on two nonconvex problems to compare with the baseline algorithms both with and without communication compression: logistic regression with a nonconvex regularizer [52] on the a9a dataset [5], and training a 1-hidden layer neural network on the MNIST dataset [20].
Dataset Splits	No	The paper mentions splitting 'unshuffled datasets evenly to 10 clients' but does not provide specific train/validation/test percentages, sample counts, or explicit instructions for how the main dataset was split for training, validation, or testing purposes beyond distributing it among clients.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. The author checklist explicitly states 'No' for 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)?'.
Software Dependencies	No	The paper mentions using a 'biased gsgdb compression [1]' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries/solvers).
Experiment Setup	Yes	Moreover, we use the same best-tuned learning rate η = 0.1, batch size b = 100, and biased compression operator (gsgdb) [1] for BEER and CHOCO-SGD on both experiments.