Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How to Center Deep Boltzmann Machines

Authors: Jan Melchior, Asja Fischer, Laurenz Wiskott

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, we present numerical simulations suggesting that (i) optimal generative performance is achieved by subtracting mean values from visible as well as hidden variables, (ii) centered binary RBMs/DBMs reach significantly higher log-likelihood values than normal binary RBMs/DBMs, (iii) centering variants whose offsets depend on the model mean, like the enhanced gradient, suffer from severe divergence problems, (iv) learning is stabilized if an exponentially moving average over the batch means is used for the offset values instead of the current batch mean, which also prevents the enhanced gradient from severe divergence, (v) on a similar level of log-likelihood values centered binary RBMs/DBMs have smaller weights and bigger bias parameters than normal binary RBMs/DBMs, (vi) centering leads to an update direction that is closer to the natural gradient, which is extremely efficient for training as we show for small binary RBMs, (vii) centering eliminates the need for greedy layer-wise pre-training of DBMs, which often even deteriorates the results independently of whether centering is used or not, and (ix) centering is also beneficial for auto encoders. Our experimental setups are described in Section 5 before we empirically analyze the performance of centered RBMs with different initializations, offset parameters, sampling methods, and learning rates in Section 6. This empirical analysis includes a comparison of the centered gradient with the natural gradient and extensive experiments on 10 real world data sets.
Researcher Affiliation Academia Jan Melchior EMAIL Asja Fischer EMAIL Laurenz Wiskott EMAIL Institut f ur Neuroinformatik Ruhr-Universit at Bochum Bochum, D-44801, Germany
Pseudocode Yes Algorithm 1: Training centered RBMs
Open Source Code Yes The implementation of the algorithms proposed and analyzed in this work are part of the Python library Py Deep publicly available at https://github.com/Mel Jan/Py Deep.
Open Datasets Yes The Bars & Stripes (Mac Kay, 2003) data set consists of patterns of size D D... The MNIST (Le Cun et al., 1998) data set of handwritten digits... The Cal Tech 101 Silhouettes (Marlin et al., 2010) data set... In some experiments we also considered the eight UCI binary (Larochelle et al., 2010; Larochelle and Murray, 2011) data sets
Dataset Splits Yes The MNIST (Le Cun et al., 1998) data set... It consists of 60,000 training and 10,000 testing examples of gray value handwritten digits... The Cal Tech 101 Silhouettes (Marlin et al., 2010) data set consists of 4100 training, 2307 validation, and 2264 testing examples... All UCI binary data sets... have been separated into training, validation, and test sets.
Hardware Specification No No specific hardware details (like GPU/CPU models, processors, or memory amounts) are provided for the experimental setup. The paper discusses computational costs in general terms but does not specify the hardware used to conduct the experiments.
Software Dependencies No The implementation of the algorithms proposed and analyzed in this work are part of the Python library Py Deep publicly available at https://github.com/Mel Jan/Py Deep. However, no specific version numbers for this library or any other software dependencies are provided.
Experiment Setup Yes For all models in this work the weight matrices were initialized with random values sampled from a Gaussian with zero mean and a standard deviation of 0.01. If not stated otherwise the visible biases, hidden biases, and offsets were initialized as described in Section 4.2. We began our analysis with experiments on small RBMs where the LL can be calculated exactly, where we used 4 hidden units when modeling Bars & Stripes and Shifting Bar and 16 hidden units when modeling MNIST. For training we used CD-1, PCD-1 and PTc (with c = 10 or c = 20) where the c temperatures were distributed uniformly from 0 to 1. For Bars & Stripes and Shifting Bar full-batch training was performed for 50,000 gradient updates, where the LL was evaluated every 50th gradient update. For modeling MNIST mini-batch training with a batch size of 100 was performed for 100 epochs, each consisting of 600 gradient updates and the exact LL was evaluated after each epoch. Note that in order to get an unbiased comparison of the different models, we did not use any additional modifications of the update rule like a momentum term, weight decay or an annealing learning rate.