COCA: COllaborative CAusal Regularization for Audio-Visual Question Answering
Authors: Mingrui Lao, Nan Pu, Yu Liu, Kai He, Erwin M. Bakker, Michael S. Lew
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness as well as backbone-agnostic ability of our COCA strategy, and it achieves state-of-the-art performance on the large-scale MUSIC-AVQA dataset. |
| Researcher Affiliation | Academia | 1LIACS Media Lab, Leiden University 2International School of Information Science & Engineering, Dalian University of Technology {m.lão, n.pu, k.he, e.m.bakker, m.s.k.lew}@liacs.leidenuniv.nl, liuyu8824@dlut.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | We evaluate our method on the large-scale MUSIC-AVQA dataset (Li et al. 2022a), which consists of more than 40K question-answer pairs covering comprehensive question types over textual, visual and audio modalities. |
| Dataset Splits | No | The paper mentions a "training split" and "test set" but does not provide specific percentages or counts for training, validation, and test datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions pre-trained models (ResNet18, VGGish) and feature dimensions but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The initial learning rate is 1e-4, which would be decayed by multiplying 0.1 for every 8 epochs. The mini-batch and maximun epoch number are 64 and 24. The optimal trade-off factor we select is α = 0.75, which is also validated in Fig. 5. |