A Streaming End-to-End Framework For Spoken Language Understanding

Authors: Nihal Potdar, Anderson Raymundo Avila, Chao Xing, Dong Wang, Yiran Cao, Xiao Chen

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our solution on the Fluent Speech Commands (FSC) dataset and the intent detection accuracy is about 97 % on all multi-intent settings.
Researcher Affiliation Collaboration Nihal Potdar 1 , Anderson R. Avila2 , Chao Xing2 , Dong Wang3 , Yiran Cao 1 , Xiao Chen2 1University of Waterloo 2Huawei Noah s Ark Lab 3Tsinghua University
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement or link for open-source code availability for the described methodology.
Open Datasets Yes The Fluent Speech Commands (FSC) dataset [Lugosch et al., 2019] was used to train and evaluate our SLU model for intent classification. ... We also used the Google Speech Commands (GSC) dataset [Warden, 2018].
Dataset Splits Yes The whole dataset was split to three subsets: the training set (FSC-Tr) contained 14.7 hours of data, totalling 23,132 utterances from 77 speakers; the validation set (FSC-Val) and test set (FSC-Tst) comprised 1.9 and 2.4 hours of speech, leading to 3,118 utterances from 10 speakers and 3,793 utterances from other 10 speakers, respectively.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models or types) used for running the experiments.
Software Dependencies No The paper mentions 'The Kaldi toolkit is used' and 'ADAM optimizer', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The model was trained using the ADAM optimizer [Loshchilov and Hutter, 2017], with the initial learning rate set to 0.0001. Dropout probability was set to 0.1 and the parameter for weight decay was set to 0.2. For the ASR pre-training, the ASR model was trained 100 epochs; for the CE pre-training, the model was trained for 10 epochs with the CE criterion.