Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines

Authors: Aurélien Decelle, Cyril Furtlehner, Beatriz Seoane

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we show that this mixing time plays a crucial role in the dynamics and stability of the trained model, and that RBMs operate in two well-defined regimes, namely equilibrium and out-of-equilibrium, depending on the interplay between this mixing time of the model and the number of steps, k, used to approximate the gradient. We further show empirically that this mixing time increases with the learning... All our conclusions rely on the study of 9 datasets: MNIST [24], whose results are discussed in the main-text, Fashion MNIST [25], Caltech101 Silhouettes, small NORB dataset [26], a human genome dataset [27] (with similar dimensions as MNIST but more structured), the high-quality Celeb A [28] projected in black and white, and in low definition but in color, and CIFAR [29]. The analysis of most of the last datasets are discussed only in the SM and show a similar behavior to the one observed on MNIST.
Researcher Affiliation Academia 1Departamento de F ısica T eorica, Universidad Complutense, 28040 Madrid, Spain. 2Universit e Paris-Saclay, CNRS, INRIA Tau team, LISN, 91190, Gif-sur-Yvette, France.
Pseudocode No The paper describes the different training schemes (Rdm-k, CD-k, PCD-k) and their steps in textual form, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We released a version of the code on github with some scripts to reproduced part of the main experimental results. The dataset are public.
Open Datasets Yes All our conclusions rely on the study of 9 datasets: MNIST [24], whose results are discussed in the main-text, Fashion MNIST [25], Caltech101 Silhouettes, small NORB dataset [26], a human genome dataset [27] (with similar dimensions as MNIST but more structured), the high-quality Celeb A [28] projected in black and white, and in low definition but in color, and CIFAR [29].
Dataset Splits No The paper mentions evaluating the log-likelihood of the 'train/test set' in Section 3 and details training parameters in Section 4, stating that 'The training details are specified in the supplementary material.' However, it does not explicitly specify the proportions or sizes of training, validation, or test dataset splits in the main text.
Hardware Specification No The paper states in the checklist, 'We put in the SM estimate of the training time of our experiments and the used material,' indicating that hardware specifications are detailed in the supplementary material but not in the main paper itself.
Software Dependencies No The paper does not explicitly mention any specific software dependencies along with their version numbers that would be necessary to replicate the experiments.
Experiment Setup Yes During the learning, we update the parameters for 2 105 times using the gradient estimated following the Rdm-k scheme (with k = 10, 50, 100, 500, 103 and 104). We save the RBM parameters at different numbers of updates equally spaced in logarithmic scale. ... Unless something else is mentioned, we will consider RBMs with Nh =500 hidden nodes and trained with a fixed learning rate of η = 0.01, hence the different RBMs will differ only by the value of k and tage.