Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
Authors: Wei-Ning Hsu, Yu Zhang, James Glass
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks. |
| Researcher Affiliation | Academia | Wei-Ning Hsu, Yu Zhang, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA {wnhsu,yzhang87,glass}@csail.mit.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have released the code for the model described in this paper.1 https://github.com/wnhsu/Factorized Hierarchical VAE |
| Open Datasets | Yes | The following two corpora are used for our experiments: (1) TIMIT [10]... (2) Aurora-4 [32], a broadband corpus designed for noisy speech recognition tasks based on the Wall Street Journal corpus (WSJ0) [31]. |
| Dataset Splits | Yes | Two 14 hour training sets are used, where one is clean and the other is a mix of all four conditions. The same noise types and microphones are used to generate the development and test sets, which both consist of 330 utterances from all four conditions, resulting in 4,620 utterances in total for each set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only discusses the neural network architectures and training procedures. |
| Software Dependencies | No | The paper mentions algorithms used like Adam and LSTM, but does not provide specific version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The dimensions of z1, z2, µ2 are 16, 16, 16 respectively... All the LSTM networks are 1-layered with 256 cells... All the MLP networks have two layers... The dimension of the output layer for MLP networks for mean is 16, and for log variance is 16... The learning rate is fixed at 10−4... The mini-batch size is 128. |