Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression
Authors: Haoyu Zhao, Boyue Li, Zhize Li, Peter Richtarik, Yuejie Chi
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime. |
| Researcher Affiliation | Academia | Haoyu Zhao Princeton University EMAIL Boyue Li Carnegie Mellon University EMAIL Zhize Li Carnegie Mellon University EMAIL Peter Richtárik King Abdullah University of Science and Technology EMAIL Yuejie Chi Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1 BEER: BEtter compr Ession for decent Ralized optimization |
| Open Source Code | Yes | The code can be accessed at: https://github.com/liboyue/beer. |
| Open Datasets | Yes | We run experiments on two nonconvex problems to compare with the baseline algorithms both with and without communication compression: logistic regression with a nonconvex regularizer [52] on the a9a dataset [5], and training a 1-hidden layer neural network on the MNIST dataset [20]. |
| Dataset Splits | No | The paper mentions splitting 'unshuffled datasets evenly to 10 clients' but does not provide specific train/validation/test percentages, sample counts, or explicit instructions for how the main dataset was split for training, validation, or testing purposes beyond distributing it among clients. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. The author checklist explicitly states 'No' for 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)?'. |
| Software Dependencies | No | The paper mentions using a 'biased gsgdb compression [1]' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries/solvers). |
| Experiment Setup | Yes | Moreover, we use the same best-tuned learning rate η = 0.1, batch size b = 100, and biased compression operator (gsgdb) [1] for BEER and CHOCO-SGD on both experiments. |