BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression
Authors: Haoyu Zhao, Boyue Li, Zhize Li, Peter Richtarik, Yuejie Chi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime. |
| Researcher Affiliation | Academia | Haoyu Zhao Princeton University haoyu@princeton.edu Boyue Li Carnegie Mellon University boyuel@andrew.cmu.edu Zhize Li Carnegie Mellon University zhizel@andrew.cmu.edu Peter Richtárik King Abdullah University of Science and Technology peter.richtarik@kaust.edu.sa Yuejie Chi Carnegie Mellon University yuejiec@andrew.cmu.edu |
| Pseudocode | Yes | Algorithm 1 BEER: BEtter compr Ession for decent Ralized optimization |
| Open Source Code | Yes | The code can be accessed at: https://github.com/liboyue/beer. |
| Open Datasets | Yes | We run experiments on two nonconvex problems to compare with the baseline algorithms both with and without communication compression: logistic regression with a nonconvex regularizer [52] on the a9a dataset [5], and training a 1-hidden layer neural network on the MNIST dataset [20]. |
| Dataset Splits | No | The paper mentions splitting 'unshuffled datasets evenly to 10 clients' but does not provide specific train/validation/test percentages, sample counts, or explicit instructions for how the main dataset was split for training, validation, or testing purposes beyond distributing it among clients. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. The author checklist explicitly states 'No' for 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)?'. |
| Software Dependencies | No | The paper mentions using a 'biased gsgdb compression [1]' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries/solvers). |
| Experiment Setup | Yes | Moreover, we use the same best-tuned learning rate η = 0.1, batch size b = 100, and biased compression operator (gsgdb) [1] for BEER and CHOCO-SGD on both experiments. |