reproducibilityindex.ai

Distributed Second-Order Optimization using Kronecker-Factored Approximations

Authors: Jimmy Ba, Roger Grosse, James Martens

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally evaluated distributed K-FAC on several large convolutional neural network training tasks involving the CIFAR-10 and Image Net classiﬁcation datasets.
Researcher Affiliation	Collaboration	Jimmy Ba University of Toronto jimmy@psi.toronto.edu Roger Grosse University of Toronto rgrosse@cs.toronto.edu James Martens University of Toronto and Google Deep Mind jmartens@cs.toronto.edu
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	No	The paper states: 'We provide a Tensorﬂow implementation of our approach which is easy to use and can be applied to many existing codebases without modiﬁcation.' However, it does not provide a specific link to the source code or an unambiguous statement of public release.
Open Datasets	Yes	We experimentally evaluated distributed K-FAC on several large convolutional neural network training tasks involving the CIFAR-10 and Image Net classiﬁcation datasets. (Krizhevsky and Hinton, 2009) (Russakovsky et al., 2015)
Dataset Splits	No	The paper mentions 'validation curves' and 'validation error' in its figures and discussions (e.g., 'the validation error is often lower than the training error during the first 90% of training'), indicating the use of a validation set, but does not provide specific details on its size, percentage split, or how it was formed.
Hardware Specification	Yes	Due to computational resource constraints, we used a single GPU server with 8 Nvidia K80 GPUs to simulate a large distributed system. The GPUs were used as gradient workers... with the CPUs acting as a parameter server. The Fisher block inversions were performed on the CPUs in parallel, using as many threads as possible. ... our 16 core Xeon 2.2Ghz CPU.
Software Dependencies	No	The paper states 'We chose to base our implementation of distributed K-FAC on the Tensor Flow framework (Abadi et al., 2016)' but does not specify any version numbers for TensorFlow or other software dependencies.
Experiment Setup	Yes	Meta-parameters such as learning rates, damping parameters, and the decay-rate for the secondorder statistics, were optimized carefully by hand for each method. The momentum was ﬁxed to 0.9. ... All the CIFAR-10 experiments use a mini-batch size of 512. ... we used the KL-based step sized selection method described in Section 5 with parameters c0 = 0.01 and ζ = 0.96. The SGD baselines use an exponential learning rate decay schedule with a decay rate of 0.96. Decaying is applied after each half-epoch for distributed K-FAC and SGD+Batch Normalization, and after every two epochs for plain SGD...