Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Birder: Communication-Efficient 1-bit Adaptive Optimizer for Practical Distributed DNN Training
Authors: Hanyang Peng, Shuang Qin, Yue Yu, Jin Wang, Hui Wang, Ge Li
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments, conducted on 8 to 64 GPUs (1 to 8 nodes) using DDP, demonstrate that Birder achieves comparable inference performance to uncompressed SGDM/Adam, with up to 2.5 speedup for training Res Net-50 and 6.3 speedup for training BERT-Base. |
| Researcher Affiliation | Academia | Hanyang Peng1 , Shuang Qin1 , Yue Yu1 , Jin Wang1, Hui Wang1, Ge Li2 1Peng Cheng Laboratory, Shenzhen, China 2School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China |
| Pseudocode | Yes | Algorithm 1. Birder |
| Open Source Code | Yes | Code is publicly available at https://openi.pcl.ac.cn/c2net_optim/Birder. |
| Open Datasets | Yes | For the experiments over Res Net-50, we evaluate the convergence and performance of SGDM, 1-bit Adam and Birder on ILSVRC2012... For the experiments over BERT-Base, we access the convergence and performance of Bert Adam (baseline), 1-bit Adam and Birder for SQu AD 1.1 fine-tuning task using a pre-trained BERT-Base model checkpoint from Hugging Face. |
| Dataset Splits | No | The paper uses well-known datasets (ILSVRC2012, SQuAD 1.1, CIFAR100, Penn Tree Bank, Wikipedia) which often have standard validation splits, but it does not explicitly state the details of a validation split (e.g., percentages, sample counts, or how it was used) for its experiments. |
| Hardware Specification | Yes | Our experiments were conducted on a testbed consisting of 1, 2, 4, 8 nodes interconnected via 10Gbps Ethernet. Each node was equipped with 8 Nvidia Tesla A100-80GB GPUs. |
| Software Dependencies | Yes | Py Torch 1.11.0 was used as the primary framework, accompanied by CUDA-11.6, cu DNN-8.2, NCCL-2.10.3, and Py Torch 1.11.0 for other relevant libraries. |
| Experiment Setup | Yes | For the experiments over Res Net-50, we evaluate the convergence and performance of SGDM, 1-bit Adam and Birder on ILSVRC2012. The batch size per GPU is set to 32 or 128... When employing SGDM (baseline), the learning rate starts at 0.1 batch size / 256 with momentum of 0.9 and weight decay of 0.0001. When employing 1-bit Adam and Birder, the learning rate starts at 0.001 batch size / 256 with weight decay of 0.0001, and [β1, β2] for 1-bit Adam is set to [0.9, 0.999] and β for Birder is set to 0.95. Then, the learning rate is divided by 10 after 30, 60 and 90 epochs, and training is finally terminated after 100 epochs. |