Deep Temporal Sigmoid Belief Networks for Sequence Modeling

Authors: Zhe Gan, Chunyuan Li, Ricardo Henao, David E. Carlson, Lawrence Carin

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on bouncing balls, polyphonic music, motion capture, and text streams show that the proposed approach achieves state-of-the-art predictive performance, and has the capacity to synthesize various sequences.
Researcher Affiliation Academia Department of Electrical and Computer Engineering Duke University, Durham, NC 27708 {zhe.gan, chunyuan.li, r.henao, david.carlson, lcarin}@duke.edu
Pseudocode No The paper describes algorithms in text but does not provide structured pseudocode or an algorithm block in the main body.
Open Source Code Yes Code is available at https://github.com/zhegan27/TSBN_code_NIPS2015.
Open Datasets Yes We present experimental results on four publicly available datasets: the bouncing balls [9], polyphonic music [10], motion capture [7] and state-of-the-Union [30].
Dataset Splits No The paper specifies training and testing sets, for example, 'generated 4000 videos for training, and another 200 videos for testing' for the bouncing balls dataset, but does not explicitly mention or quantify a validation dataset split.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models or memory used for running its experiments.
Software Dependencies No The paper mentions using a 'variant of RMSprop' and other training parameters, but does not provide specific software dependency details like library names with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Model parameters were initialized by sampling randomly from N(0, 0.0012I), except for the bias parameters, that were initialized as 0. The TSBN model is trained using a variant of RMSprop [6], with momentum of 0.9, and a constant learning rate of 10 4. The decay over the root mean squared gradients is set to 0.95. The maximum number of iterations we use is 105. The gradient estimates were computed using a single sample from the recognition model. The only regularization we used was a weight decay of 10 4. The data-dependent baseline was implemented by using a neural network with a single hidden layer with 100 tanh units.