Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Monotone deep Boltzmann machines

Authors: Zhili Feng, Ezra Winston, J Zico Kolter

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a proof of concept, we evaluate our proposed m DBM on the MNIST and CIFAR-10 datasets. We demonstrate how to jointly model missing pixels and class labels conditioned on only a subset of observed pixels. On MNIST, we compare m DBM to mean-field inference in a traditional deep RBM. Despite being small-scale tasks, the goal here is to demonstrate joint inference and learning over what is still a reasonably-sized joint model, considering the number of hidden units. Nonetheless, the current experiments are admittedly largely a demonstration of the proposed method rather than a full accounting of its performance. We also show how our mean-field inference method compares to those proposed in prior works. On the joint imputation and classification task, we train models using our updates and the updates proposed by Krähenbühl & Koltun (2013) and Baqué et al. (2016), and perform mean-field inference in each model using all three update methods, with and without the monotonicity constraint. Pixel imputation is shown in Figure 3. We report the image imputation ℓ2 loss on MNIST in Table 1. We additionally evaluate m DBM on a task in which random 14 14 patches are masked. We evaluate m DBM on an analogous task of image pixel imputation and label prediction on CIFAR-10. The imputation error is reported in Table 2.
Researcher Affiliation Collaboration Zhili Feng EMAIL Machine Learning Department Carnegie Mellon University Ezra Winston EMAIL Machine Learning Department Carnegie Mellon University J. Zico Kolter EMAIL Computer Science Department Carnegie Mellon University Bosch Center for AI
Pseudocode Yes Algorithm 1 Forward Iteration Algorithm 2 Backward Iteration Algorithm 3 Training
Open Source Code Yes We describe the details in the appendix and include an efficient Py Torch function implementation in the supplementary material.
Open Datasets Yes As a proof of concept, we evaluate our proposed m DBM on the MNIST and CIFAR-10 datasets.
Dataset Splits Yes For the joint imputation and classification task, we randomly mask each pixel independently with probability 60%, such that in expectation only 40% of the pixels are observed. We randomly mask p = {0.2, 0.4, 0.6, 0.8} portion of the inputs. For each p, the experiments are conducted 5 times where each run independently chooses the random mask. For CIFAR-10, we train for 100 epochs using standard data augmentation; during the first 10 epochs, the weight on the reconstruction loss is ramped up from 0.0 to 0.5 and the weight on the classification loss ramped down from 1.0 to 0.5; also during the first 20 epochs, the percentage of observation pixels is ramped down from 100% to 50%.
Hardware Specification No The paper states "we derive a highly efficient GPU-based implementation" but does not specify any particular GPU model or other hardware details.
Software Dependencies No The paper mentions "Py Torch function implementation" but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup Yes Treating the image reconstruction as a dense classification task, we use cross-entropy loss and class weights 1 β 1 βni with β = 0.9999 (Cui et al., 2019), where ni is the number of times pixels with intensity i appear in the hidden pixels. For classification, we use standard cross-entropy loss. To enable joint training, we put equal weight of 0.5 on both task losses and backpropagate through their sum. For both tasks, we put τiΦq i into the cross-entropy loss as logits, as described in Equation (17). To achieve faster damped forward-backward iteration, we implement Anderson acceleration (Walker & Ni, 2011), and stop the fixed point update as soon as the relative difference between two iterations (that is, qt+1 qt / qt ) is less than 0.01, unless we hit a maximum number of 50 allowed iterations. For proxα f and the damped iteration, we set α = 0.125. We use the Adam optimizer with learning rate 0.001. For MNIST, we train for 40 epochs. For CIFAR-10, we train for 100 epochs using standard data augmentation; during the first 10 epochs, the weight on the reconstruction loss is ramped up from 0.0 to 0.5 and the weight on the classification loss ramped down from 1.0 to 0.5; also during the first 20 epochs, the percentage of observation pixels is ramped down from 100% to 50%. The deep RBM is trained using CD-1 algorithm for 100 epochs with a batch size of 128 and learning rate of 0.01.