Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributed Second-Order Optimization using Kronecker-Factored Approximations
Authors: Jimmy Ba, Roger Grosse, James Martens
ICLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally evaluated distributed K-FAC on several large convolutional neural network training tasks involving the CIFAR-10 and Image Net classification datasets. |
| Researcher Affiliation | Collaboration | Jimmy Ba University of Toronto EMAIL Roger Grosse University of Toronto EMAIL James Martens University of Toronto and Google Deep Mind EMAIL |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper states: 'We provide a Tensorflow implementation of our approach which is easy to use and can be applied to many existing codebases without modification.' However, it does not provide a specific link to the source code or an unambiguous statement of public release. |
| Open Datasets | Yes | We experimentally evaluated distributed K-FAC on several large convolutional neural network training tasks involving the CIFAR-10 and Image Net classification datasets. (Krizhevsky and Hinton, 2009) (Russakovsky et al., 2015) |
| Dataset Splits | No | The paper mentions 'validation curves' and 'validation error' in its figures and discussions (e.g., 'the validation error is often lower than the training error during the first 90% of training'), indicating the use of a validation set, but does not provide specific details on its size, percentage split, or how it was formed. |
| Hardware Specification | Yes | Due to computational resource constraints, we used a single GPU server with 8 Nvidia K80 GPUs to simulate a large distributed system. The GPUs were used as gradient workers... with the CPUs acting as a parameter server. The Fisher block inversions were performed on the CPUs in parallel, using as many threads as possible. ... our 16 core Xeon 2.2Ghz CPU. |
| Software Dependencies | No | The paper states 'We chose to base our implementation of distributed K-FAC on the Tensor Flow framework (Abadi et al., 2016)' but does not specify any version numbers for TensorFlow or other software dependencies. |
| Experiment Setup | Yes | Meta-parameters such as learning rates, damping parameters, and the decay-rate for the secondorder statistics, were optimized carefully by hand for each method. The momentum was fixed to 0.9. ... All the CIFAR-10 experiments use a mini-batch size of 512. ... we used the KL-based step sized selection method described in Section 5 with parameters c0 = 0.01 and ζ = 0.96. The SGD baselines use an exponential learning rate decay schedule with a decay rate of 0.96. Decaying is applied after each half-epoch for distributed K-FAC and SGD+Batch Normalization, and after every two epochs for plain SGD... |