Multi-view Recurrent Neural Acoustic Word Embeddings
Authors: Wanjia He, Weiran Wang, Karen Livescu
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our acoustic word embeddings improve over previous approaches for the task of word discrimination. We also present results on other tasks that are enabled by the multi-view approach, including cross-view word discrimination and word similarity. |
| Researcher Affiliation | Academia | Wanjia He Department of Computer Science University of Chicago Chicago, IL 60637, USA wanjia@ttic.edu Weiran Wang & Karen Livescu Toyota Technological Institute at Chicago Chicago, IL 60637, USA {weiranwang,klivescu}@ttic.edu |
| Pseudocode | No | The paper includes illustrations of the model architecture but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our tensorflow implementation is available at https://github.com/opheadacheh/Multi-view-neural-acoustic-words-embeddings |
| Open Datasets | Yes | The data is drawn from the Switchboard English conversational speech corpus (Godfrey et al., 1992). |
| Dataset Splits | Yes | The train/dev/test splits contain 9971/10966/11024 pairs of acoustic segments and character sequences, corresponding to 1687/3918/3390 unique words. |
| Hardware Specification | No | The paper mentions 'This research used GPUs donated by NVIDIA Corporation' in the acknowledgments, but it does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions 'Our tensorflow implementation' but does not specify the version of TensorFlow or any other software dependencies with their version numbers. |
| Experiment Setup | Yes | 2-layer bidirectional LSTMs with 512 hidden units per direction per layer perform well... We use the Adam optimizer (Kingma & Ba, 2015) for updating the weights using mini-batches of 20 acoustic segments, with an initial learning rate tuned over {0.0001, 0.001}. Dropout is used at each layer, with the rate tuned over {0, 0.2, 0.4, 0.5}, in which 0.4 usually outperformed others. The margin in our basic contrastive objectives 0-3 is tuned over {0.3, 0.4, 0.5, 0.6, 0.7}, out of which 0.4 and 0.5 typically yield best results. For obj0 with the cost-sensitive margin, we tune the maximum margin mmax over {0.5, 0.6, 0.7} and the threshold tmax over {9, 11, 13}. We train each model for up to 1000 epochs. |