Fast Approximate Natural Gradient Descent in a Kronecker Factored Eigenbasis
Authors: Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show improvements over KFAC in optimization speed for several deep network architectures. |
| Researcher Affiliation | Collaboration | 1 Mila Université de Montréal; 2 Facebook AI Research; 3 CIFAR; equal contribution |
| Pseudocode | Yes | Algorithm 1 provides a high level pseudocode of EKFAC for the case of fully-connected layers4, and when using it to approximate the empirical Fisher. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of open-source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We consider the task of minimizing the reconstruction error of an 8-layer auto-encoder on the MNIST dataset, a standard task used to benchmark optimization algorithms... In this section, we evaluate our proposed algorithm on the CIFAR-10 dataset using a VGG11 convolutional neural network (Simonyan & Zisserman, 2015) and a Resnet34 (He et al., 2016). |
| Dataset Splits | No | The paper mentions 'validation performance' and includes 'validation' in graph legends (e.g., Figure 4 (c), Figure 6 (c)), indicating a validation set was used. However, it does not provide specific details on the size or split percentage of the validation set. |
| Hardware Specification | No | The paper mentions 'computational resources' in the Acknowledgments but does not provide specific details such as GPU/CPU models or other hardware specifications used for experiments. |
| Software Dependencies | No | The experiments were conducted using Py Torch (Paszke et al. (2017)). While PyTorch is mentioned, a specific version number is not provided, nor are other software dependencies with their versions. |
| Experiment Setup | Yes | Grid values for hyperparameters are: learning rate η and damping in 10 1, 10 2, 10 3, 10 4 , mini-batch size in {200, 500}.In addition we explored 20 values for (η, ) by random search around each grid points... We use a batch size of 500 for the KFAC based approaches and 200 for the SGD baselines. |