reproducibilityindex.ai

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering

Authors: Chenyu You, Nuo Chen, Yuexian Zou

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that the proposed MRD-Net achieves superior results compared with state-of-the-art methods on three spoken question answering benchmark datasets.
Researcher Affiliation	Collaboration	Chenyu You1 , Nuo Chen2 , Yuexian Zou2,3 1Department of Electrical Engineering, Yale University, USA 2ADSPLAB, School of ECE, Peking University, Shenzhen, China 3Peng Cheng Laboratory, Shenzhen, China
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing open-source code or links to a code repository.
Open Datasets	Yes	Spoken-SQu AD [Li et al., 2018] is an English listening comprehension dataset... FGC 2018 Formosa Grand Challenge (FGC) dataset 1 is a Mandarin Chinese spoken multi-choice question answering (MCQA) dataset... Spoken-Co QA [You et al., 2020a] is an English spoken conversational question answering (SCQA) dataset... and 1https://fgc.stpi.narl.org.tw/activity/techai2018
Dataset Splits	Yes	Spoken-SQu AD [Li et al., 2018] is an English listening comprehension dataset, which contains 37k ASR transcripts question pairs in the training set and 5.4k in the testing set, respectively. FGC 2018 Formosa Grand Challenge (FGC) dataset 1 is a Mandarin Chinese spoken multi-choice question answering (MCQA) dataset, which includes 7k passage-question-choices (PQC) pairs as the training set and 1.5k as the development set, respectively. Spoken-Co QA [You et al., 2020a] is an English spoken conversational question answering (SCQA) dataset, which consists of 40k and 3.8k question-answer pairs from 4k conversations in the training set and 380 conversations in test set from seven diverse domains, respectively.
Hardware Specification	Yes	We train our student model using 2 NVIDIA 2080Ti GPU.
Software Dependencies	No	The paper mentions software components like 'BPE as the tokenizer', 'VQ-Wav2Vec as tokenizer', 'Adam W optimizer', and 'Kaldi toolkit' but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	The maximum sequence lengths of T and S are 512, and the Audio-A is 1024. We utilize Adam W optimizer in training, and the learning rate is set to 8e-6. All models are trained using 4 as the batch size. The hyperparameter τ and α are set to 1 and 0.9, respectively.