MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering
Authors: Chenyu You, Nuo Chen, Yuexian Zou
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that the proposed MRD-Net achieves superior results compared with state-of-the-art methods on three spoken question answering benchmark datasets. |
| Researcher Affiliation | Collaboration | Chenyu You1 , Nuo Chen2 , Yuexian Zou2,3 1Department of Electrical Engineering, Yale University, USA 2ADSPLAB, School of ECE, Peking University, Shenzhen, China 3Peng Cheng Laboratory, Shenzhen, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing open-source code or links to a code repository. |
| Open Datasets | Yes | Spoken-SQu AD [Li et al., 2018] is an English listening comprehension dataset... FGC 2018 Formosa Grand Challenge (FGC) dataset 1 is a Mandarin Chinese spoken multi-choice question answering (MCQA) dataset... Spoken-Co QA [You et al., 2020a] is an English spoken conversational question answering (SCQA) dataset... and 1https://fgc.stpi.narl.org.tw/activity/techai2018 |
| Dataset Splits | Yes | Spoken-SQu AD [Li et al., 2018] is an English listening comprehension dataset, which contains 37k ASR transcripts question pairs in the training set and 5.4k in the testing set, respectively. FGC 2018 Formosa Grand Challenge (FGC) dataset 1 is a Mandarin Chinese spoken multi-choice question answering (MCQA) dataset, which includes 7k passage-question-choices (PQC) pairs as the training set and 1.5k as the development set, respectively. Spoken-Co QA [You et al., 2020a] is an English spoken conversational question answering (SCQA) dataset, which consists of 40k and 3.8k question-answer pairs from 4k conversations in the training set and 380 conversations in test set from seven diverse domains, respectively. |
| Hardware Specification | Yes | We train our student model using 2 NVIDIA 2080Ti GPU. |
| Software Dependencies | No | The paper mentions software components like 'BPE as the tokenizer', 'VQ-Wav2Vec as tokenizer', 'Adam W optimizer', and 'Kaldi toolkit' but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | The maximum sequence lengths of T and S are 512, and the Audio-A is 1024. We utilize Adam W optimizer in training, and the learning rate is set to 8e-6. All models are trained using 4 as the batch size. The hyperparameter τ and α are set to 1 and 0.9, respectively. |