reproducibilityindex.ai

A Streaming End-to-End Framework For Spoken Language Understanding

Authors: Nihal Potdar, Anderson Raymundo Avila, Chao Xing, Dong Wang, Yiran Cao, Xiao Chen

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our solution on the Fluent Speech Commands (FSC) dataset and the intent detection accuracy is about 97 % on all multi-intent settings.
Researcher Affiliation	Collaboration	Nihal Potdar 1 , Anderson R. Avila2 , Chao Xing2 , Dong Wang3 , Yiran Cao 1 , Xiao Chen2 1University of Waterloo 2Huawei Noah s Ark Lab 3Tsinghua University
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code availability for the described methodology.
Open Datasets	Yes	The Fluent Speech Commands (FSC) dataset [Lugosch et al., 2019] was used to train and evaluate our SLU model for intent classiﬁcation. ... We also used the Google Speech Commands (GSC) dataset [Warden, 2018].
Dataset Splits	Yes	The whole dataset was split to three subsets: the training set (FSC-Tr) contained 14.7 hours of data, totalling 23,132 utterances from 77 speakers; the validation set (FSC-Val) and test set (FSC-Tst) comprised 1.9 and 2.4 hours of speech, leading to 3,118 utterances from 10 speakers and 3,793 utterances from other 10 speakers, respectively.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models or types) used for running the experiments.
Software Dependencies	No	The paper mentions 'The Kaldi toolkit is used' and 'ADAM optimizer', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	The model was trained using the ADAM optimizer [Loshchilov and Hutter, 2017], with the initial learning rate set to 0.0001. Dropout probability was set to 0.1 and the parameter for weight decay was set to 0.2. For the ASR pre-training, the ASR model was trained 100 epochs; for the CE pre-training, the model was trained for 10 epochs with the CE criterion.