Deep Graph Random Process for Relational-Thinking-Based Speech Recognition
Authors: Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations on ASR tasks including CHi ME-2 and CHi ME-5 demonstrate the effectiveness and benefits of our method. |
| Researcher Affiliation | Academia | 1National University of Singapore 2Massachusetts Institute of Technology. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code, a repository link, or an explicit code release statement for the methodology described. |
| Open Datasets | Yes | We perform the preliminary experiments for the proposed RTN on a reading speech recognition corpus, CHi ME-2 (Vincent et al., 2013). We then evaluate our method on CHi ME-5 (Barker et al., 2018)... We finally investigate the interpretability of our model on a synthetic relational SWitch Board (Godfrey et al., 1992) data. |
| Dataset Splits | Yes | The training set contains 7138 simulated noisy utterances. The development and test sets contain 2460 and 1980 simulated noisy utterances respectively. [...] The training dataset, development dataset and test dataset includes about 40 hours, 4 hours, and 5 hours of real conversational speech respectively. |
| Hardware Specification | Yes | In our experiments, the timing experiments used the Py Torch package and were performed on a machine running the Ubuntu operating system with a single Intel Xeon Silver 4214 CPU and a GTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions using 'Py Torch package' but does not specify a version number for it or other key software dependencies. |
| Experiment Setup | Yes | We adopt the neural network fθ to calculate the node embedding of DGP. The architecture of such a neural network has 6 SRU layers (each with 1024 hidden states), which is firstly followed by a max-pooling layer and then a single-layer multi-Layer perceptron (MLP). [...] The size of the latent vector of VSRU is set as 4 for CHi ME-5 and 16 for CHi ME2 and Relational SWB. [...] To jointly optimize above components, we adopt variational inference framework and successfully derive an effective evidence lower bound (ELBO). [...] All RNNs were trained by optimizing the categorical cross-entropy using BPTT and SGD. We applied a dropout rate of 0.1 to the connections between recurrent layers. |