Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation
Authors: Chih-Chun Yang, Wan-Cyuan Fan, Cheng-Fu Yang, Yu-Chiang Frank Wang3036-3044
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments on different recognition and synthesis tasks to show that our model performs favorably against state-of-the-art approaches on each individual task, while ours is a uniļ¬ed solution that is able to jointly tackle the aforementioned audio-visual learning tasks. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Information Engineering, National Taiwan University, Taiwan, R.O.C. 2Graduate Institute of Communication Engineering, National Taiwan University, Taiwan, R.O.C. 3ASUS Intelligent Cloud Services, Taiwan, R.O.C. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an unambiguous statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | LRW (Chung and Zisserman 2016): Known for its variety of speaking styles and head poses across subjects, LRW is an English-speaking video dataset collected from BBC programs with more than 1000 speakers. ... LRW-1000 (Yang et al. 2019): LRW-1000 is a Mandarin-speaking video dataset collected from more than 2,000 subjects with 1,000 vocabulary size. |
| Dataset Splits | No | The paper uses standard datasets (LRW and LRW-1000) but does not explicitly provide specific train/validation/test split percentages, sample counts, or detailed splitting methodology needed for reproduction. It mentions 'validation' in the context of a loss function, not dataset splitting. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments. It only states, 'We implement our model using Pytorch.' |
| Software Dependencies | No | The paper mentions 'Pytorch' and 'Adam W' but does not provide specific version numbers for these or any other software dependencies, making the setup irreproducible in terms of software. |
| Experiment Setup | Yes | Adam W is used as the optimizer for training with weight decay 5 10 4 as regularization. For the linguistic and synthesis modules, initial learning rates of 3 10 4 and 1 10 4 with a schedule of reduction are applied, respectively. |