Predicting distributions with Linearizing Belief Networks
Authors: Yann Dauphin, David Grangier
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section evaluates the modeling power of LBNs and other stochastic networks on multi-modal distributions. In particular, we will experimentally confirm the claim that LBNs learn faster and generalize better than other stochastic networks described in the literature. To do so, we consider the tasks of modeling facial expressions and image denoising on benchmark datasets. |
| Researcher Affiliation | Industry | Yann N. Dauphin, David Grangier Facebook AI Research 1 Hacker Way Menlo Park, CA 94025, USA {ynd,grangier}@fb.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper provides links to videos/demonstrations ('http://ynd.github.io/lbn_denoising_demo/') but does not offer concrete access to the source code for the methodology described in the paper. There is no statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | The pictures are taken from the Toronto Face Dataset (TFD) (Susskind et al., 2010)... We extract 19 × 19 image patches from the Imagenet dataset. |
| Dataset Splits | Yes | Following the setting of Tang & Salakhutdinov (2013), we randomly selected 95 subjects with 1,318 images for training, 5 subjects with 68 images for validation and 24 individuals totaling 343 images were used as a test set. |
| Hardware Specification | Yes | All experiments are run the same hardware (Nvidia Tesla K40m GPUs) |
| Software Dependencies | No | The paper mentions using the Adam optimizer and Glorot & Bengio parameter initialization. However, it does not specify software dependencies with version numbers (e.g., specific Python, TensorFlow, or PyTorch versions) needed for replication. |
| Experiment Setup | Yes | We train networks with the Adam (Kingma & Ba, 2014) gradient-based optimizer and the parameter initialization of (Glorot & Bengio, 2010). We found it was optimal to initialize the biases of all units in the gating networks to 2 to promote sparsity. The hyper-parameters of the network are cross-validated using a grid search where the learning rate is always taken from {10-3, 10-4, 10-5}, while the other hyper-paremeters are found in a task specific manner. The networks were trained for 200 iterations on the training set with up to k = 200 Monte Carlo samples to estimate the expectation over outcomes. The stochastic networks are trained with 4 layers with either 128 or 256 deterministic hidden units. ReLU activations are used for the deterministic units as they were found to be good for continuous problems. The 2 intermediary layers are augmented with either 32 or 64 random Bernoulli units. The number of hidden units in the LBNs was chosen from {128, 256} with the number of hidden layers fixed to 1. The gating network has 2 hidden layers with {64, 128} hidden units. |