Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Provably Improves the Convergence of Gradient Descent
Authors: Qingyu Song, Wei Lin, Hong Xu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our theoretical findings through comprehensive experiments. The results showcase significant performance advantages, including up to a 50% improvement in solution optimality over the standard GD algorithm post-training, and superior robustness compared to SOTA L2O models and the Adam optimizer [10]. |
| Researcher Affiliation | Academia | Qingyu Song Xiamen University EMAIL Wei Lin, Hong Xu The Chinese University of Hong Kong EMAIL, EMAIL |
| Pseudocode | No | The paper includes a computational graph (Figure 1) detailing the Math-L2O forward and backward operations, and various mathematical formulations for derivatives, but does not present a distinct pseudocode or algorithm block with structured steps. |
| Open Source Code | Yes | The code of our method can be found from https://github.com/Net X-lab/Math L2OProof-Official. |
| Open Datasets | Yes | Utilizing a compact Convolutional Neural Network (CNN) on the MNIST dataset, our method achieved significantly faster convergence, thereby corroborating our theoretical findings. |
| Dataset Splits | No | For the synthetic data, the paper states: "vectors X R5120 1 and Y R4000 1 for Equation (2) are generated by sampling from a standard Gaussian distribution." For MNIST: "The optimization objective is the total cross-entropy loss over 200 randomly selected MNIST samples." While samples are generated or selected, explicit train/validation/test splits are not provided. |
| Hardware Specification | Yes | Experiments are conducted using Python 3.9 and Py Torch 1.12.0 on an Ubuntu 20.04 system equipped with 128GB of RAM and two NVIDIA RTX 3090 GPUs. |
| Software Dependencies | Yes | Experiments are conducted using Python 3.9 and Py Torch 1.12.0 on an Ubuntu 20.04 system equipped with 128GB of RAM and two NVIDIA RTX 3090 GPUs. |
| Experiment Setup | Yes | The Math-L2O model is configured with T = 100 optimization steps (Equation (2)). Its architecture comprises a L = 3-layer DNN, as formulated in Equation (4). The first layer has an output dimension of 2. To ensure over-parameterization, the (L 1)-th (i.e., second) layer s output dimension is set to 512 10 = 5120. The final layer produces a scalar output (dimension 1). Three specific model configurations are designed for ablation studies, foundational experiments, and robustness evaluations. These are detailed in Appendix C.1. L2O models are trained using the Stochastic Gradient Descent (SGD) optimizer. |