Deep Temporal Sigmoid Belief Networks for Sequence Modeling
Authors: Zhe Gan, Chunyuan Li, Ricardo Henao, David E. Carlson, Lawrence Carin
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on bouncing balls, polyphonic music, motion capture, and text streams show that the proposed approach achieves state-of-the-art predictive performance, and has the capacity to synthesize various sequences. |
| Researcher Affiliation | Academia | Department of Electrical and Computer Engineering Duke University, Durham, NC 27708 {zhe.gan, chunyuan.li, r.henao, david.carlson, lcarin}@duke.edu |
| Pseudocode | No | The paper describes algorithms in text but does not provide structured pseudocode or an algorithm block in the main body. |
| Open Source Code | Yes | Code is available at https://github.com/zhegan27/TSBN_code_NIPS2015. |
| Open Datasets | Yes | We present experimental results on four publicly available datasets: the bouncing balls [9], polyphonic music [10], motion capture [7] and state-of-the-Union [30]. |
| Dataset Splits | No | The paper specifies training and testing sets, for example, 'generated 4000 videos for training, and another 200 videos for testing' for the bouncing balls dataset, but does not explicitly mention or quantify a validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models or memory used for running its experiments. |
| Software Dependencies | No | The paper mentions using a 'variant of RMSprop' and other training parameters, but does not provide specific software dependency details like library names with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | Model parameters were initialized by sampling randomly from N(0, 0.0012I), except for the bias parameters, that were initialized as 0. The TSBN model is trained using a variant of RMSprop [6], with momentum of 0.9, and a constant learning rate of 10 4. The decay over the root mean squared gradients is set to 0.95. The maximum number of iterations we use is 105. The gradient estimates were computed using a single sample from the recognition model. The only regularization we used was a weight decay of 10 4. The data-dependent baseline was implemented by using a neural network with a single hidden layer with 100 tanh units. |