reproducibilityindex.ai

A Recurrent Latent Variable Model for Sequential Data

Authors: Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C. Courville, Yoshua Bengio

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate the proposed model against other related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamics. ... We evaluate the proposed VRNN model on two tasks: (1) modelling natural speech directly from the raw audio waveforms; (2) modelling handwriting generation.
Researcher Affiliation	Academia	Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio Department of Computer Science and Operations Research Universit e de Montr eal CIFAR Senior Fellow {firstname.lastname}@umontreal.ca
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at http://www.github.com/jych/nips2015_vrnn
Open Datasets	Yes	We evaluate the models on the following four speech datasets: 1. Blizzard: This text-to-speech dataset made available by the Blizzard Challenge 2013 contains 300 hours of English, spoken by a single female speaker [10]. 2. TIMIT: This widely used dataset for benchmarking speech recognition systems contains 6, 300 English sentences, read by 630 speakers. 3. Onomatopoeia2: This is a set of 6, 738 non-linguistic human-made sounds such as coughing, screaming, laughing and shouting, recorded from 51 voice actors. 4. Accent: This dataset contains English paragraphs read by 2, 046 different native and nonnative English speakers [19]. ... Handwriting generation We let each model learn a sequence of (x, y) coordinates together with binary indicators of pen-up/pen-down, using the IAM-On DB dataset, which consists of 13, 040 handwritten lines written by 500 writers [14].
Dataset Splits	Yes	Except the TIMIT dataset, the rest of the datasets do not have predeﬁned train/test splits. We shufﬂe and divide the data into train/validation/test splits using a ratio of 0.9/0.05/0.05. ... The ﬁnal model was chosen with early-stopping based on the validation performance.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions general setup parameters.
Software Dependencies	No	The paper mentions
Experiment Setup	Yes	The only preprocessing used in our experiments is normalizing each sequence using the global mean and standard deviation computed from the entire training set. We train each model with stochastic gradient descent on the negative log-likelihood using the Adam optimizer [12], with a learning rate of 0.001 for TIMIT and Accent and 0.0003 for the rest. We use a minibatch size of 128 for Blizzard and Accent and 64 for the rest. ... We ﬁx each model to have a single recurrent hidden layer with 2000 LSTM units (in the case of Blizzard, 4000 and for IAM-On DB, 1200). All of ϕτ shown in Eqs. (5) (7), (9) have four hidden layers using rectiﬁed linear units [15] (for IAM-On DB, we use a single hidden layer). ... Note that we use 20 mixture components for models using a GMM as the output function.