Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploring Landscapes for Better Minima along Valleys

Authors: Tong Zhao, Jiacheng Li, Yuanchang Zhou, Guangming Tan, Weile Jia

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our testing results show that the adapted Lamb, ALTO, increases the test accuracy (generalization) of the current state-of-the-art optimizer by an average of 2.5% across a variety of large-batch training tasks. This work potentially opens a new research direction in the design of optimization algorithms. ... Our experimental results demonstrate the superior performance of ALTO across various datasets and tasks, such as CV [20, 50, 42] and NLP [32, 33] training, with 3-5 times hyperparameter tuning per task for all optimizers in large batch training. Compared to the current state-of-the-art, ALTO achieves better accuracy in all our 17 CV and NLP experimental tasks and can save 29.68% of computation time on a typical CV task while reaching the same accuracy.
Researcher Affiliation	Academia	Tong Zhao1,2 Jiacheng Li1,2 Yuanchang Zhou1,2 Guangming Tan1,2, Weile Jia1,2, 1State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences 2University of Chinese Academy of Sciences. EMAIL
Pseudocode	Yes	Algorithm 1: ALTO Vanilla ... Algorithm 2: ALTO ... Algorithm 3: ESGD ... Algorithm 4: EAdam ... Algorithm 5: Generic form of E-adapted optimizer ... Algorithm 6: ALTO Vanilla ... Algorithm 7: ALTO Vanilla
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes]
Open Datasets	Yes	Our experimental results demonstrate the superior performance of ALTO across various datasets and tasks, such as CV [20, 50, 42] and NLP [32, 33] training, with 3-5 times hyperparameter tuning per task for all optimizers in large batch training. ... diverse datasets (CIFAR-10, CIFAR-100 [24], Image Net [15], Co NLL-2003 [38], IMDB [2], MRPC [46], and GPT-2 Output Dataset [1])
Dataset Splits	Yes	For this simple univariate function fitting task, our training set was constructed with uniformly sampled points from y=sin(x), where x ranges from -10 to 10, with a total of 32,786 / 0.8 samples. Noise with a mean of zero was added to the corresponding y values. In the dataset, 80% was used as the training set and 20% as the validation set.
Hardware Specification	Yes	Machine configuration. All our experiments were conducted on single node equipped with 4 NVIDIA 80GB A100 GPUs interconnected with PCI-E3.0.
Software Dependencies	No	The paper mentions software components like "Megatron-LM Framework" and "Tianshou", but does not specify their version numbers or other key software dependencies with specific versions.
Experiment Setup	Yes	Hyperparameters. Though ALTO introduces five extra hyperparameters compared with Adam, we usually and only adjust parameter β1 and η according to batch size. It is clear that the larger the batch size is, the larger the α and β1 should be. Hence, we set α = 0.5, β1 = 0.01 in small batch training (batch size <1K) and α = 5, β1 = 0.99 in large batch case (batch size 1K), unless otherwise specified. If not mentioned, we set β2 = 0.9, β3 = 0.99, λ = 10 4, ε1 = 10 6, ε2 = 10 6, ε3 = 10 10.