Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens
Authors: Ross M Clarke, José Miguel Hernández-Lobato
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Adam QLR on a range of regression and classification tasks at various scales and hyperparameter tuning methodologies, concluding K-FAC s adaptive heuristics are of variable standalone general effectiveness, and finding an untuned Adam QLR setting can achieve comparable performance vs runtime to tuned benchmarks. 4. Experiments |
| Researcher Affiliation | Academia | Ross M. Clarke 1 José Miguel Hernández-Lobato 1 1University of Cambridge. Correspondence to: Ross M. Clarke <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Adam (Kingma & Ba, 2015) Algorithm 2 Adam QLR |
| Open Source Code | Yes | Code for all our experiments is available at https://github.com/ rmclarke/Adam Through ASecond Order Lens. We describe our algorithm fully in Section 3, provide full source code to the reviewers and will publish this code to the community after deanonymisation. |
| Open Datasets | Yes | Rosenbrock (1960) Function, UCI Energy (Tsanas & Xifara, 2012), UCI Protein (Rana, 2013), Fashion-MNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR-10 (Krizhevsky, 2009), Penn Treebank (Marcus, Mitchell P. et al., 1999; Marcus et al., 1999). Table 4: Licences under which we use datasets in this work |
| Dataset Splits | Yes | otherwise, we separate the standard test set, randomly choose 1/6 (Fashion-MNIST and SVHN) or 1/10 (CIFAR-10) of the remaining data to form a validation set, and use cross-entropy loss. All hyperparameter tuning uses ASHA (Li et al., 2020) over 200 random initialisations, targeting a fixed number of training epochs, subject to a maximum runtime of 15 minutes (only reached for CIFAR-10; see Appendix B.1.4 for experiments using runtime as the primary constraint). |
| Hardware Specification | Yes | Our experiments were performed on one of the two sets of hardware shown in Table 3. All runtime comparisons were performed on like-for-like hardware. We make use of GPU acceleration throughout, with the JAX (Bradbury et al., 2018), Haiku (Hennigan et al., 2020) and KFAC-JAX (Botev & Martens, 2022) libraries, along with various related components of the Deep Mind JAX Ecosystem (Babuschkin et al., 2020). Table 3: System configurations used to run our experiments. Type CPU GPU (NVIDIA) Python JAX CUDA cu DNN Consumer Desktop Intel Core i7-3930K RTX 2080GTX 3.10.11 0.3.25 11.4 8.05 Local Cluster Intel Core i9-10900X RTX 2080GTX 3.10.11 0.3.25 11.8 8.05 |
| Software Dependencies | Yes | We make use of GPU acceleration throughout, with the JAX (Bradbury et al., 2018), Haiku (Hennigan et al., 2020) and KFAC-JAX (Botev & Martens, 2022) libraries, along with various related components of the Deep Mind JAX Ecosystem (Babuschkin et al., 2020). Table 3: System configurations used to run our experiments. Type CPU GPU (NVIDIA) Python JAX CUDA cu DNN Consumer Desktop Intel Core i7-3930K RTX 2080GTX 3.10.11 0.3.25 11.4 8.05 Local Cluster Intel Core i9-10900X RTX 2080GTX 3.10.11 0.3.25 11.8 8.05 |
| Experiment Setup | Yes | Except for the Rosenbrock Function and (Untuned) variants, we also tune a batch size over {50, 100, 200, 400, 800, 1 600, 3 200}. All hyperparameter tuning uses ASHA (Li et al., 2020) over 200 random initialisations, targeting a fixed number of training epochs, subject to a maximum runtime of 15 minutes (only reached for CIFAR-10; see Appendix B.1.4 for experiments using runtime as the primary constraint). Table 1: Hyperparameter search spaces for Section 4 Table 2: Optimal hyperparameters used to produce the results of Section 4, Appendix B.1.2 and Appendix B.3 |