A Kronecker-factored approximate Fisher matrix for convolution layers
Authors: Roger Grosse, James Martens
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. |
| Researcher Affiliation | Academia | Roger Grosse RGROSSE@CS.TORONTO.EDU James Martens JMARTENS@CS.TORONTO.EDU Department of Computer Science, University of Toronto |
| Pseudocode | No | The paper describes algorithms and methods but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper mentions and links to the Toronto Deep Learning Conv Net (TDLCN) package ('https://github.com/TorontoDeepLearning/convnet/'). However, this is described as a package they 'used' ('the TDLCN CUDA kernels we used were carefully tuned...'), not as the open-source release for the specific KFC methodology developed in this paper. |
| Open Datasets | Yes | We have evaluated our method on two standard image recognition benchmark datasets: CIFAR-10 (Krizhevsky, 2009), and Street View Housing Numbers (SVHN; Netzer et al., 2011). |
| Dataset Splits | No | The paper mentions using training and testing data ('In our experiments, KFC was able to optimize conv nets several times faster than carefully tuned SGD, in terms of both training and test error.') and discusses mini-batch sizes ('mini-batches of size 128' or '512'), but it does not provide explicit details on the dataset splits (e.g., percentages, sample counts for train/validation/test sets) needed to reproduce the data partitioning. |
| Hardware Specification | Yes | All experiments for which wall clock time is reported were run on a single Nvidia Ge Force GTX Titan Z GPU board. |
| Software Dependencies | No | The paper mentions using 'CUDAMat (Mnih, 2009)' and 'the Toronto Deep Learning Conv Net (TDLCN) package (Srivastava, 2015)' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For KFC-pre, we used a momentum parameter of 0.9, mini-batches of size 512, and a damping parameter γ = 10 3. |