Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

Authors: Da Chang, Ganzhao Yuan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across diverse tasks, including MAE pretraining, LLM pretraining, and downstream fine-tuning, demonstrate that our MGUP-enhanced optimizers achieve superior or more stable performance compared to their original base optimizers. We validate the proposed MGUP optimizers through key experiments, including MAE pretraining of Vi T-27M on CIFAR-10; autoregressive pretraining of LLa MA2-71M and Qwen2.5-150M on Wikitext-103; and fine-tuning of Ro BERTa-base on GLUE and LLa MA2-7B for GSM-8K.
Researcher Affiliation	Academia	1Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences 2Shenzhen University of Advanced Technology, 3Pengcheng Laboratory 4University of Chinese Academy of Sciences
Pseudocode	Yes	Algorithm 1 MGUP-Adam W Algorithm 2 MGUP
Open Source Code	Yes	The code is publicly available at https://github.com/Mae Chd/MGUP.
Open Datasets	Yes	Datasets. We use the image dataset CIFAR-10, the text dataset Wikitext-103, and the language model fine-tuning benchmarks GLUE and GSM-8K.
Dataset Splits	Yes	Datasets. We use the image dataset CIFAR-10, the text dataset Wikitext-103, and the language model fine-tuning benchmarks GLUE and GSM-8K. For GLUE, the Hugging Face implementation is used. For GSM8K, evaluation is via standardized lm-evaluation-harness on the GSM8K benchmark with the Hugging Face implementation. These are standard benchmarks with well-defined splits, implicitly used by researchers.
Hardware Specification	Yes	All experiments are conducted using two NVIDIA V100 (32GB) GPUs and four NVIDIA RTX 4090 (24GB) GPUs.
Software Dependencies	No	The text mentions 'Hugging Face implementation' for GLUE and GSM-8K, 'llm-foundry codebase' and 'lm-evaluation-harness' for GSM-8K fine-tuning and evaluation, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Detailed experimental settings are provided in Appendix G. This appendix includes tables such as 'Table 3: Hyperparameters used for training Vi T', 'Table 4: Hyperparameters used for training LLa MA2-71M on Wiki Text-103', 'Table 5: Hyperparameters used for training Qwen2.5-150M on Wiki Text-103', 'Table 6: Hyperparameters used for fine-tuning on GLUE', and 'Table 7: Hyperparameter configurations for fine-tuning LLa MA2-7B on GSM8K', which detail learning rates, batch sizes, epochs, weight decay, and other optimizer parameters.