Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Global curvature for second-order optimization of neural networks

Authors: Alberto Bernacchia

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the practical implications of our framework, we apply second-order optimization to synthetic data, achieving markedly faster convergence compared to traditional optimization methods.
Researcher Affiliation Industry 1Media Tek Research, Cambridge, UK. Correspondence to: Alberto Bernacchia <EMAIL>.
Pseudocode Yes A detailed description of the complete procedure is provided in Algorithm 1 in the Appendix, using the simple case of a two-layer MLP with Tanh activation and no bias.
Open Source Code Yes Code: github.com/mtkresearch/symo notebooks
Open Datasets No The synthetic dataset consists of 5000 training and 5000 testing data points, where the input is sampled from a Gaussian distribution with zero mean. The covariance matrix of the input is generated using random orthogonal eigenvectors (Mezzadri, 2007), and the eigenvalues are set on a logarithmic grid between 10 5 and 100.
Dataset Splits Yes The synthetic dataset consists of 5000 training and 5000 testing data points, where the input is sampled from a Gaussian distribution with zero mean.
Hardware Specification No These are matrix-matrix products of size equal to the neural network width, that can be computed efficiently using a GPU.
Software Dependencies No In Pytorch for example, Assumption 2.1 holds for nn.init.normal and nn.init.orthogonal...
Experiment Setup Yes For all optimizers, learning rate is set by a grid search. For second-order optimizers, we additionally set a second hyperparameter by grid search: damping λ for KFAC, initialization ϵ for Shampoo and decay parameter β for Sym O.