A Kronecker-factored approximate Fisher matrix for convolution layers

Authors: Roger Grosse, James Martens

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD.
Researcher Affiliation Academia Roger Grosse RGROSSE@CS.TORONTO.EDU James Martens JMARTENS@CS.TORONTO.EDU Department of Computer Science, University of Toronto
Pseudocode No The paper describes algorithms and methods but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper mentions and links to the Toronto Deep Learning Conv Net (TDLCN) package ('https://github.com/TorontoDeepLearning/convnet/'). However, this is described as a package they 'used' ('the TDLCN CUDA kernels we used were carefully tuned...'), not as the open-source release for the specific KFC methodology developed in this paper.
Open Datasets Yes We have evaluated our method on two standard image recognition benchmark datasets: CIFAR-10 (Krizhevsky, 2009), and Street View Housing Numbers (SVHN; Netzer et al., 2011).
Dataset Splits No The paper mentions using training and testing data ('In our experiments, KFC was able to optimize conv nets several times faster than carefully tuned SGD, in terms of both training and test error.') and discusses mini-batch sizes ('mini-batches of size 128' or '512'), but it does not provide explicit details on the dataset splits (e.g., percentages, sample counts for train/validation/test sets) needed to reproduce the data partitioning.
Hardware Specification Yes All experiments for which wall clock time is reported were run on a single Nvidia Ge Force GTX Titan Z GPU board.
Software Dependencies No The paper mentions using 'CUDAMat (Mnih, 2009)' and 'the Toronto Deep Learning Conv Net (TDLCN) package (Srivastava, 2015)' but does not provide specific version numbers for these software components.
Experiment Setup Yes For KFC-pre, we used a momentum parameter of 0.9, mini-batches of size 512, and a damping parameter γ = 10 3.