reproducibilityindex.ai

A Kronecker-factored approximate Fisher matrix for convolution layers

Authors: Roger Grosse, James Martens

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD.
Researcher Affiliation	Academia	Roger Grosse RGROSSE@CS.TORONTO.EDU James Martens JMARTENS@CS.TORONTO.EDU Department of Computer Science, University of Toronto
Pseudocode	No	The paper describes algorithms and methods but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper mentions and links to the Toronto Deep Learning Conv Net (TDLCN) package ('https://github.com/TorontoDeepLearning/convnet/'). However, this is described as a package they 'used' ('the TDLCN CUDA kernels we used were carefully tuned...'), not as the open-source release for the specific KFC methodology developed in this paper.
Open Datasets	Yes	We have evaluated our method on two standard image recognition benchmark datasets: CIFAR-10 (Krizhevsky, 2009), and Street View Housing Numbers (SVHN; Netzer et al., 2011).
Dataset Splits	No	The paper mentions using training and testing data ('In our experiments, KFC was able to optimize conv nets several times faster than carefully tuned SGD, in terms of both training and test error.') and discusses mini-batch sizes ('mini-batches of size 128' or '512'), but it does not provide explicit details on the dataset splits (e.g., percentages, sample counts for train/validation/test sets) needed to reproduce the data partitioning.
Hardware Specification	Yes	All experiments for which wall clock time is reported were run on a single Nvidia Ge Force GTX Titan Z GPU board.
Software Dependencies	No	The paper mentions using 'CUDAMat (Mnih, 2009)' and 'the Toronto Deep Learning Conv Net (TDLCN) package (Srivastava, 2015)' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	For KFC-pre, we used a momentum parameter of 0.9, mini-batches of size 512, and a damping parameter γ = 10 3.