Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Gradient descent with generalized Newton’s method

Authors: Zhiqi Bu, Shiyun Xu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present extensive experiments on language and vision tasks (e.g. GPT and Res Net) to showcase that Ge N optimizers match the state-of-the-art performance, which was achieved with carefully tuned learning rate schedulers.
Researcher Affiliation	Collaboration	Zhiqi Bu EMAIL Shiyun Xu University of Pennsylvania EMAIL
Pseudocode	Yes	Algorithm 1 Generalized Newton s optimizers (Ge N), e.g. γ = 0.9, Φ = 8
Open Source Code	Yes	Equal contribution. Code available at https://github.com/ShiyunXu/AutoGeN.
Open Datasets	Yes	We train CIFAR10 (Krizhevsky et al., 2009) on Res Net 18, 34, 50, 152 (He et al., 2016) and Vi T tiny, small, base and large (Dosovitskiy et al., 2020). For fine tuning, we use the pretrained models from the Py Torch Image Models framework (Wightman, 2019).
Dataset Splits	Yes	We train CIFAR10 (Krizhevsky et al., 2009) on Res Net 18, 34, 50, 152 (He et al., 2016) and Vi T tiny, small, base and large (Dosovitskiy et al., 2020)... CIFAR10 and CIFAR100 are standard tiny image datasets that we have used as the test-bed... We evaluate Ro BERTa-base (Liu et al., 2019) on the GLUE (Wang et al., 2019) benchmark with Lo RA, Bit Fit and full-parameter training (FT).
Hardware Specification	No	Our default setting is full-parameter training (including mixed precision training), Φ = 1, and on single GPU (no communication cost among devices).
Software Dependencies	No	For fine tuning, we use the pretrained models from the Py Torch Image Models framework (Wightman, 2019). ... following the official Pytorch tutorial
Experiment Setup	Yes	Our default hyperparameters for Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 are: B = 500, Φ = 4, SGD learning rate=1e-2, Adam W learning rate=1e-4, unless one of the hyperparameters are varied for the ablation study. ... In Figure 1, Figure 2, Figure 9 and Table 3, we follow the codebase of Hu et al. and use B = 256, sequence length 128, η0 = 1e 3, and 5 epochs. While applying, we set Φ = 4. ... Batch size Initial learning rate for FT # of epochs Eval metrics MRPC 128 2e-5 10 F1 SST2 128 1e-6 10 acc. MNLI 128 1e-6 5 (1 for FT) matched acc.&mismatched acc. Co LA 128 2e-5 10 Matthews corr. QNLI 128 2e-5 10 acc. QQP 256 2e-5 5 F1 RTE 128 2e-5 60 acc.