Training Restricted Boltzmann Machine via the Thouless-Anderson-Palmer free energy

Authors: Marylou Gabrie, Eric W. Tramel, Florent Krzakala

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the performance of the proposed deterministic EMF RBM training algorithm, we perform a number of numerical experiments over two separate datasets and compare these results with both CD-1 and PCD. We first use the MNIST dataset of labeled handwritten digit images [25]. Second, we use the 28 × 28 pixel version of the Caltech 101 Silhouette dataset [26].
Researcher Affiliation Academia Marylou Gabri e Eric W. Tramel Florent Krzakala Laboratoire de Physique Statistique, UMR 8550 CNRS Ecole Normale Sup erieure & Universit e Pierre et Marie Curie 75005 Paris, France {marylou.gabrie, eric.tramel}@lps.ens.fr, florent.krzakala@ens.fr
Pseudocode No The paper describes iterative update rules (Eqs. 9, 10) in Section 4.1 but does not present them within a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Available as a Julia package at https://github.com/sphinxteam/Boltzmann.jl
Open Datasets Yes We first use the MNIST dataset of labeled handwritten digit images [25]. Second, we use the 28 × 28 pixel version of the Caltech 101 Silhouette dataset [26].
Dataset Splits Yes The dataset is split between 60 000 training images and 10 000 test images. Both subsets contain approximately the same fraction of the ten digit classes (0 to 9). Each image is comprised of 28 × 28 pixels taking values in the range [0, 255]. The MNIST dataset was binarized by setting all non-zero pixels to 1 in all experiments. Second, we use the 28 × 28 pixel version of the Caltech 101 Silhouette dataset [26]... The dataset is split between a training (4 100 images), a validation (2 264 images), and a test (2 304 images) sets.
Hardware Specification Yes For a Julia implementation of the tested RBM training techniques running on a 3.2 GHz Intel i5 processor, we report the 10 trial average wall times for fitting a single 100-sample batch normalized against the model complexity.
Software Dependencies No The paper mentions a 'Julia implementation' and using the 'scikit-learn toolbox [28]' but does not provide specific version numbers for these software components.
Experiment Setup Yes For both datasets, the RBM models require 784 visible units. Following previous studies evaluating RBMs on these datasets, we fix the number of RBM hidden units to 500 in all our experiments. During training, we adopt the mini-batch learning procedure for gradient averaging, with 100 training points per batch for MNIST and 256 training points per batch for Caltech 101 Silhouette. [...] We also employ a weight decay regularization in all our trainings to keep weights small; a necessity for the weak coupling expansion on which the EMF relies. [...] Our final experiments consist of persistent training algorithms using 3 iterations of the magnetization self-consistency relations (P-MF, P-TAP2 and P-TAP3) and one persistent training algorithm using 30 iterations (P-TAP2-30) for comparison.