reproducibilityindex.ai

Predicting distributions with Linearizing Belief Networks

Authors: Yann Dauphin, David Grangier

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section evaluates the modeling power of LBNs and other stochastic networks on multi-modal distributions. In particular, we will experimentally conﬁrm the claim that LBNs learn faster and generalize better than other stochastic networks described in the literature. To do so, we consider the tasks of modeling facial expressions and image denoising on benchmark datasets.
Researcher Affiliation	Industry	Yann N. Dauphin, David Grangier Facebook AI Research 1 Hacker Way Menlo Park, CA 94025, USA {ynd,grangier}@fb.com
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper provides links to videos/demonstrations ('http://ynd.github.io/lbn_denoising_demo/') but does not offer concrete access to the source code for the methodology described in the paper. There is no statement about releasing code or a link to a code repository.
Open Datasets	Yes	The pictures are taken from the Toronto Face Dataset (TFD) (Susskind et al., 2010)... We extract 19 × 19 image patches from the Imagenet dataset.
Dataset Splits	Yes	Following the setting of Tang & Salakhutdinov (2013), we randomly selected 95 subjects with 1,318 images for training, 5 subjects with 68 images for validation and 24 individuals totaling 343 images were used as a test set.
Hardware Specification	Yes	All experiments are run the same hardware (Nvidia Tesla K40m GPUs)
Software Dependencies	No	The paper mentions using the Adam optimizer and Glorot & Bengio parameter initialization. However, it does not specify software dependencies with version numbers (e.g., specific Python, TensorFlow, or PyTorch versions) needed for replication.
Experiment Setup	Yes	We train networks with the Adam (Kingma & Ba, 2014) gradient-based optimizer and the parameter initialization of (Glorot & Bengio, 2010). We found it was optimal to initialize the biases of all units in the gating networks to 2 to promote sparsity. The hyper-parameters of the network are cross-validated using a grid search where the learning rate is always taken from {10-3, 10-4, 10-5}, while the other hyper-paremeters are found in a task speciﬁc manner. The networks were trained for 200 iterations on the training set with up to k = 200 Monte Carlo samples to estimate the expectation over outcomes. The stochastic networks are trained with 4 layers with either 128 or 256 deterministic hidden units. ReLU activations are used for the deterministic units as they were found to be good for continuous problems. The 2 intermediary layers are augmented with either 32 or 64 random Bernoulli units. The number of hidden units in the LBNs was chosen from {128, 256} with the number of hidden layers ﬁxed to 1. The gating network has 2 hidden layers with {64, 128} hidden units.