reproducibilityindex.ai

Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix

Authors: Roger Grosse, Ruslan Salakhudinov

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we ﬁrst evaluate FANG by comparing the accuracy of the approximation Gfac with various generic approximations to PSD matrices. Next, we evaluate its ability to train binary restricted Boltzmann machines as generative models, compared with SGD, both with and without the centering trick. ... Our RBM training experiments were conducted on two datasets: the MNIST handwritten digit dataset... and the more complex Omniglot dataset...
Researcher Affiliation	Academia	Roger B. Grosse RGROSSE@CS.TORONTO.EDU Ruslan Salakhutdinov RSALAKHU@CS.TORONTO.EDU Department of Computer Science, University of Toronto
Pseudocode	Yes	Algorithm 1 Factorized Natural Gradient (FANG) for binary RBMs
Open Source Code	No	The paper does not provide a link or explicit statement about the availability of open-source code for the described methodology.
Open Datasets	Yes	Our RBM training experiments were conducted on two datasets: the MNIST handwritten digit dataset... and the more complex Omniglot dataset of handwritten characters in a variety of world languages (Lake et al., 2013).
Dataset Splits	No	Our RBM training experiments were conducted on two datasets: the MNIST handwritten digit dataset... and the more complex Omniglot dataset... (Lake et al., 2013). ... We used 2000 PCD particles, mini-batches of size 2000, and a learning rate schedule of α p γ/(γ + t), where t is the update count, γ = 1000, and α was tuned separately for each algorithm.
Hardware Specification	No	Our implementation made use of the CUDAMat (Mnih, 2009) and Gnumpy (Tieleman, 2010) libraries for GPU linear algebra operations.
Software Dependencies	No	Our implementation made use of the CUDAMat (Mnih, 2009) and Gnumpy (Tieleman, 2010) libraries for GPU linear algebra operations.
Experiment Setup	Yes	We used 2000 PCD particles, mini-batches of size 2000, and a learning rate schedule of α p γ/(γ + t), where t is the update count, γ = 1000, and α was tuned separately for each algorithm.