reproducibilityindex.ai

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition

Authors: Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluations on ASR tasks including CHi ME-2 and CHi ME-5 demonstrate the effectiveness and beneﬁts of our method.
Researcher Affiliation	Academia	1National University of Singapore 2Massachusetts Institute of Technology.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code, a repository link, or an explicit code release statement for the methodology described.
Open Datasets	Yes	We perform the preliminary experiments for the proposed RTN on a reading speech recognition corpus, CHi ME-2 (Vincent et al., 2013). We then evaluate our method on CHi ME-5 (Barker et al., 2018)... We ﬁnally investigate the interpretability of our model on a synthetic relational SWitch Board (Godfrey et al., 1992) data.
Dataset Splits	Yes	The training set contains 7138 simulated noisy utterances. The development and test sets contain 2460 and 1980 simulated noisy utterances respectively. [...] The training dataset, development dataset and test dataset includes about 40 hours, 4 hours, and 5 hours of real conversational speech respectively.
Hardware Specification	Yes	In our experiments, the timing experiments used the Py Torch package and were performed on a machine running the Ubuntu operating system with a single Intel Xeon Silver 4214 CPU and a GTX 2080Ti GPU.
Software Dependencies	No	The paper mentions using 'Py Torch package' but does not specify a version number for it or other key software dependencies.
Experiment Setup	Yes	We adopt the neural network fθ to calculate the node embedding of DGP. The architecture of such a neural network has 6 SRU layers (each with 1024 hidden states), which is ﬁrstly followed by a max-pooling layer and then a single-layer multi-Layer perceptron (MLP). [...] The size of the latent vector of VSRU is set as 4 for CHi ME-5 and 16 for CHi ME2 and Relational SWB. [...] To jointly optimize above components, we adopt variational inference framework and successfully derive an effective evidence lower bound (ELBO). [...] All RNNs were trained by optimizing the categorical cross-entropy using BPTT and SGD. We applied a dropout rate of 0.1 to the connections between recurrent layers.