reproducibilityindex.ai

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

Authors: Wei-Ning Hsu, Yu Zhang, James Glass

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker veriﬁcation and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.
Researcher Affiliation	Academia	Wei-Ning Hsu, Yu Zhang, and James Glass Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA {wnhsu,yzhang87,glass}@csail.mit.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We have released the code for the model described in this paper.1 https://github.com/wnhsu/Factorized Hierarchical VAE
Open Datasets	Yes	The following two corpora are used for our experiments: (1) TIMIT [10]... (2) Aurora-4 [32], a broadband corpus designed for noisy speech recognition tasks based on the Wall Street Journal corpus (WSJ0) [31].
Dataset Splits	Yes	Two 14 hour training sets are used, where one is clean and the other is a mix of all four conditions. The same noise types and microphones are used to generate the development and test sets, which both consist of 330 utterances from all four conditions, resulting in 4,620 utterances in total for each set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only discusses the neural network architectures and training procedures.
Software Dependencies	No	The paper mentions algorithms used like Adam and LSTM, but does not provide specific version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The dimensions of z1, z2, µ2 are 16, 16, 16 respectively... All the LSTM networks are 1-layered with 256 cells... All the MLP networks have two layers... The dimension of the output layer for MLP networks for mean is 16, and for log variance is 16... The learning rate is fixed at 10−4... The mini-batch size is 128.