Factored Temporal Sigmoid Belief Networks for Sequence Learning

Authors: Jiaming Song, Zhe Gan, Lawrence Carin

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the proposed approach achieves state-of-theart predictive and classification performance on sequential data, and has the capacity to synthesize sequences, with controlled style transitioning and blending.
Researcher Affiliation Academia Jiaming Song JIAMING.TSONG@GMAIL.COM Zhe Gan ZHE.GAN@DUKE.EDU Lawrence Carin LCARIN@DUKE.EDU Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA
Pseudocode No The paper describes the model formulation and learning process using mathematical equations and textual explanations, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Sections 5.1-5.4 report the results of training several models with data from the CMU Motion Capture Database. Specifically, we consider two datasets: (i) motion sequences performed by subject 35 (Taylor et al., 2006)(mocap2)... (ii) motion sequences performed by subject 137 (Taylor & Hinton, 2009)(mocap10)... The weather prediction dataset (Liu et al., 2010)... We select 4 books (i.e., Napoleon the Little, The Common Law, Mysticism and Logic and The Bible) in the Gutenberg corpus...
Dataset Splits No For mocap2, 'We used 33 running and walking sequences, partitioned them into 31 training sequences and 2 test sequences'. For mocap10, 'For each style, we select 90% of the sequences as training data, and use the rest as testing data.' A distinct validation split is not explicitly mentioned with percentages or counts.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions 'RMSprop (Tieleman & Hinton, 2012)' as the optimization method and 'word2vec (Mikolov et al., 2013)' for word embeddings, but it does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For CTSBN, the model parameters for weights are initialized by sampling from N(0, 0.0012I), whereas the bias parameters are initialized as zero. For FCTSBN, the parameters are initialized differently, since the actual initialization value of the weight parameters depends on the product of factors. To ensure faster convergence, the initial values Wa, Wb and Wc are sampled from N(0, 0.012I). We use RMSprop (Tieleman & Hinton, 2012) throughout all the experiments. ... Our models have 100 hidden units (and 50 factors for factored models) in each layer and the order of n = 1... A single-layer Hidden Markov FCTSBN with 100 hidden units and order of n = 12 is trained on the dataset for 400 epochs, whereas the parameters are updated 10 times for each epoch. We set a fixed learning rate of 3 10 3, and a decay rate of 0.9. The datadependent baseline is implemented with a single-hidden-layer neural network with 100 tanh units. We update the estimated learning signal with a momentum of 0.9.