Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

Authors: Da Chang, Ganzhao Yuan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across diverse tasks, including MAE pretraining, LLM pretraining, and downstream fine-tuning, demonstrate that our MGUP-enhanced optimizers achieve superior or more stable performance compared to their original base optimizers. We validate the proposed MGUP optimizers through key experiments, including MAE pretraining of Vi T-27M on CIFAR-10; autoregressive pretraining of LLa MA2-71M and Qwen2.5-150M on Wikitext-103; and fine-tuning of Ro BERTa-base on GLUE and LLa MA2-7B for GSM-8K.
Researcher Affiliation Academia 1Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences 2Shenzhen University of Advanced Technology, 3Pengcheng Laboratory 4University of Chinese Academy of Sciences
Pseudocode Yes Algorithm 1 MGUP-Adam W Algorithm 2 MGUP
Open Source Code Yes The code is publicly available at https://github.com/Mae Chd/MGUP.
Open Datasets Yes Datasets. We use the image dataset CIFAR-10, the text dataset Wikitext-103, and the language model fine-tuning benchmarks GLUE and GSM-8K.
Dataset Splits Yes Datasets. We use the image dataset CIFAR-10, the text dataset Wikitext-103, and the language model fine-tuning benchmarks GLUE and GSM-8K. For GLUE, the Hugging Face implementation is used. For GSM8K, evaluation is via standardized lm-evaluation-harness on the GSM8K benchmark with the Hugging Face implementation. These are standard benchmarks with well-defined splits, implicitly used by researchers.
Hardware Specification Yes All experiments are conducted using two NVIDIA V100 (32GB) GPUs and four NVIDIA RTX 4090 (24GB) GPUs.
Software Dependencies No The text mentions 'Hugging Face implementation' for GLUE and GSM-8K, 'llm-foundry codebase' and 'lm-evaluation-harness' for GSM-8K fine-tuning and evaluation, but does not provide specific version numbers for these software components.
Experiment Setup Yes Detailed experimental settings are provided in Appendix G. This appendix includes tables such as 'Table 3: Hyperparameters used for training Vi T', 'Table 4: Hyperparameters used for training LLa MA2-71M on Wiki Text-103', 'Table 5: Hyperparameters used for training Qwen2.5-150M on Wiki Text-103', 'Table 6: Hyperparameters used for fine-tuning on GLUE', and 'Table 7: Hyperparameter configurations for fine-tuning LLa MA2-7B on GSM8K', which detail learning rates, batch sizes, epochs, weight decay, and other optimizer parameters.