Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Patch-level Sounding Object Tracking for Audio-Visual Question Answering
Authors: Zhangbin Li, Jinxing Zhou, Jing Zhang, Shengeng Tang, Kun Li, Dan Guo
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on standard datasets demonstrate the effectiveness of our method, achieving competitive performance even compared to recent large-scale pretraining-based approaches. Extensive quantitative and qualitative results validate the effectiveness of our method. |
| Researcher Affiliation | Academia | School of Computer Science and Information Engineering, Hefei University of Technology EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and mathematical formulations, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or mention code in supplementary materials. |
| Open Datasets | Yes | We primarily conduct experiments on the widely-used and challenging MUSICAVQA (Li et al. 2022) dataset. |
| Dataset Splits | Yes | Following the standard protocol in the pioneering work (Li et al. 2022), we adopt the answer prediction accuracy (%) as the metric for model evaluation. |
| Hardware Specification | Yes | All experiments are conducted on an NVIDIA A40 GPU. |
| Software Dependencies | No | The paper mentions models and optimizers like CLIP-Vi T-L/14, CLAP, and Adam W optimizer, but it does not provide specific version numbers for core software dependencies such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA. |
| Experiment Setup | Yes | During model training, we use the Adam W optimizer with an initial learning rate of 1e-4, which decays by 0.1 every 16 epochs. The batch size and epochs are set to 16 and 35, respectively. The numbers of graph layers in Gm t , Gs t , and Gq t are empirically set to 3, 3, and 2, respectively. |