Deep Graph Random Process for Relational-Thinking-Based Speech Recognition

Authors: Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluations on ASR tasks including CHi ME-2 and CHi ME-5 demonstrate the effectiveness and benefits of our method.
Researcher Affiliation Academia 1National University of Singapore 2Massachusetts Institute of Technology.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code, a repository link, or an explicit code release statement for the methodology described.
Open Datasets Yes We perform the preliminary experiments for the proposed RTN on a reading speech recognition corpus, CHi ME-2 (Vincent et al., 2013). We then evaluate our method on CHi ME-5 (Barker et al., 2018)... We finally investigate the interpretability of our model on a synthetic relational SWitch Board (Godfrey et al., 1992) data.
Dataset Splits Yes The training set contains 7138 simulated noisy utterances. The development and test sets contain 2460 and 1980 simulated noisy utterances respectively. [...] The training dataset, development dataset and test dataset includes about 40 hours, 4 hours, and 5 hours of real conversational speech respectively.
Hardware Specification Yes In our experiments, the timing experiments used the Py Torch package and were performed on a machine running the Ubuntu operating system with a single Intel Xeon Silver 4214 CPU and a GTX 2080Ti GPU.
Software Dependencies No The paper mentions using 'Py Torch package' but does not specify a version number for it or other key software dependencies.
Experiment Setup Yes We adopt the neural network fθ to calculate the node embedding of DGP. The architecture of such a neural network has 6 SRU layers (each with 1024 hidden states), which is firstly followed by a max-pooling layer and then a single-layer multi-Layer perceptron (MLP). [...] The size of the latent vector of VSRU is set as 4 for CHi ME-5 and 16 for CHi ME2 and Relational SWB. [...] To jointly optimize above components, we adopt variational inference framework and successfully derive an effective evidence lower bound (ELBO). [...] All RNNs were trained by optimizing the categorical cross-entropy using BPTT and SGD. We applied a dropout rate of 0.1 to the connections between recurrent layers.