Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DUO: No Compromise to Accuracy Degradation

Authors: Jinda Jia, Cong Xie, Hanlin Lu, Fanjiang Ye, Hao Feng, Daoce Wang, Haibin Lin, Zhi Zhang, Xin Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments are organized into two parts: accuracy and performance. Together, these evaluations show that DUO improves training accuracy while maintaining nearly identical training speed in Sharded Data Parallelism with gradient compression and can even increase throughput by enabling more aggressive compression.
Researcher Affiliation Collaboration 1Indiana University, 2Byte Dance Inc. EMAIL EMAIL
Pseudocode Yes Algorithm 1 Distributed training with DUO Algorithm 2 SGD + Fast-Slow Update
Open Source Code Yes We also provide our source code along with detailed instructions to reproduce the main training results.
Open Datasets Yes We use The Pile [12] as our training dataset due to its open-source availability and general applicability.
Dataset Splits No All pre-training tasks are run for 80 000 iterations, processing 80 billion tokens in total. Accuracy is evaluated via validation-loss comparison across all settings. While validation loss is mentioned, the specific split percentages or sample counts for training, validation, and testing are not explicitly detailed in the paper.
Hardware Specification Yes 4-node setup with A100 GPUs: Each node is equipped with 4 NVIDIA A100-SXM4-40GB GPUs connected via NVLink. Nodes are connected by 100 Gbps Slingshot links. 8-node setup with H20 GPUs: Each node is equipped with 8 NVIDIA H20 GPUs connected via NVLink. Nodes are connected by 400 Gbps Infini Band links. 4-node setup with single A100 GPU: Each node has a single NVIDIA A100-SXM4-40GB GPU. Nodes are connected by 100 Gbps Ethernet links.
Software Dependencies No To effectively evaluate the proposed method, we integrate DUO into Megatron-LM, one of the most commonly used open-source LLM training frameworks. No specific version numbers for Megatron-LM or other software dependencies are provided in the paper's text.
Experiment Setup Yes The hyperparameter settings follow those of the OPT model [35], ensuring that models of the same size use identical configurations. For reproducibility, we provide the detailed configurations in Table 6 (Appendix B). Additionally, Table 7 (Appendix C) provides training configuration for the throughput test.