Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

Authors: Wei-Ning Hsu, Yu Zhang, James Glass

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.
Researcher Affiliation Academia Wei-Ning Hsu, Yu Zhang, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA {wnhsu,yzhang87,glass}@csail.mit.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We have released the code for the model described in this paper.1 https://github.com/wnhsu/Factorized Hierarchical VAE
Open Datasets Yes The following two corpora are used for our experiments: (1) TIMIT [10]... (2) Aurora-4 [32], a broadband corpus designed for noisy speech recognition tasks based on the Wall Street Journal corpus (WSJ0) [31].
Dataset Splits Yes Two 14 hour training sets are used, where one is clean and the other is a mix of all four conditions. The same noise types and microphones are used to generate the development and test sets, which both consist of 330 utterances from all four conditions, resulting in 4,620 utterances in total for each set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only discusses the neural network architectures and training procedures.
Software Dependencies No The paper mentions algorithms used like Adam and LSTM, but does not provide specific version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The dimensions of z1, z2, µ2 are 16, 16, 16 respectively... All the LSTM networks are 1-layered with 256 cells... All the MLP networks have two layers... The dimension of the output layer for MLP networks for mean is 16, and for log variance is 16... The learning rate is fixed at 10−4... The mini-batch size is 128.