reproducibilityindex.ai

Multi-view Recurrent Neural Acoustic Word Embeddings

Authors: Wanjia He, Weiran Wang, Karen Livescu

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our acoustic word embeddings improve over previous approaches for the task of word discrimination. We also present results on other tasks that are enabled by the multi-view approach, including cross-view word discrimination and word similarity.
Researcher Affiliation	Academia	Wanjia He Department of Computer Science University of Chicago Chicago, IL 60637, USA wanjia@ttic.edu Weiran Wang & Karen Livescu Toyota Technological Institute at Chicago Chicago, IL 60637, USA {weiranwang,klivescu}@ttic.edu
Pseudocode	No	The paper includes illustrations of the model architecture but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our tensorflow implementation is available at https://github.com/opheadacheh/Multi-view-neural-acoustic-words-embeddings
Open Datasets	Yes	The data is drawn from the Switchboard English conversational speech corpus (Godfrey et al., 1992).
Dataset Splits	Yes	The train/dev/test splits contain 9971/10966/11024 pairs of acoustic segments and character sequences, corresponding to 1687/3918/3390 unique words.
Hardware Specification	No	The paper mentions 'This research used GPUs donated by NVIDIA Corporation' in the acknowledgments, but it does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions 'Our tensorflow implementation' but does not specify the version of TensorFlow or any other software dependencies with their version numbers.
Experiment Setup	Yes	2-layer bidirectional LSTMs with 512 hidden units per direction per layer perform well... We use the Adam optimizer (Kingma & Ba, 2015) for updating the weights using mini-batches of 20 acoustic segments, with an initial learning rate tuned over {0.0001, 0.001}. Dropout is used at each layer, with the rate tuned over {0, 0.2, 0.4, 0.5}, in which 0.4 usually outperformed others. The margin in our basic contrastive objectives 0-3 is tuned over {0.3, 0.4, 0.5, 0.6, 0.7}, out of which 0.4 and 0.5 typically yield best results. For obj0 with the cost-sensitive margin, we tune the maximum margin mmax over {0.5, 0.6, 0.7} and the threshold tmax over {9, 11, 13}. We train each model for up to 1000 epochs.