Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix
Authors: Roger Grosse, Ruslan Salakhudinov
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first evaluate FANG by comparing the accuracy of the approximation Gfac with various generic approximations to PSD matrices. Next, we evaluate its ability to train binary restricted Boltzmann machines as generative models, compared with SGD, both with and without the centering trick. ... Our RBM training experiments were conducted on two datasets: the MNIST handwritten digit dataset... and the more complex Omniglot dataset... |
| Researcher Affiliation | Academia | Roger B. Grosse RGROSSE@CS.TORONTO.EDU Ruslan Salakhutdinov RSALAKHU@CS.TORONTO.EDU Department of Computer Science, University of Toronto |
| Pseudocode | Yes | Algorithm 1 Factorized Natural Gradient (FANG) for binary RBMs |
| Open Source Code | No | The paper does not provide a link or explicit statement about the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Our RBM training experiments were conducted on two datasets: the MNIST handwritten digit dataset... and the more complex Omniglot dataset of handwritten characters in a variety of world languages (Lake et al., 2013). |
| Dataset Splits | No | Our RBM training experiments were conducted on two datasets: the MNIST handwritten digit dataset... and the more complex Omniglot dataset... (Lake et al., 2013). ... We used 2000 PCD particles, mini-batches of size 2000, and a learning rate schedule of α p γ/(γ + t), where t is the update count, γ = 1000, and α was tuned separately for each algorithm. |
| Hardware Specification | No | Our implementation made use of the CUDAMat (Mnih, 2009) and Gnumpy (Tieleman, 2010) libraries for GPU linear algebra operations. |
| Software Dependencies | No | Our implementation made use of the CUDAMat (Mnih, 2009) and Gnumpy (Tieleman, 2010) libraries for GPU linear algebra operations. |
| Experiment Setup | Yes | We used 2000 PCD particles, mini-batches of size 2000, and a learning rate schedule of α p γ/(γ + t), where t is the update count, γ = 1000, and α was tuned separately for each algorithm. |