Continual Audio-Visual Sound Separation

Authors: Weiguo Pian, Yiyang Nan, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that Cont AV-Sep can effectively mitigate catastrophic forgetting and achieve significantly better performance compared to other continual learning baselines for audio-visual sound separation. Code is available at: https://github.com/weiguo Pian/Cont AV-Sep_Neur IPS2024. ... In this section, we first introduce the setup of our experiments, i.e., dataset, baselines, evaluation metrics, and the implementation details.
Researcher Affiliation Academia 1 The University of Texas at Dallas 2 Brown University 3 Carnegie Mellon University
Pseudocode No The paper describes its methods in text and uses mathematical formulations but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at: https://github.com/weiguo Pian/Cont AV-Sep_Neur IPS2024.
Open Datasets Yes Following common practice [83, 88, 14], we conducted experiments on MUSIC-21 [83]... To further validate the efficacy of our method across a broader sound domain, we conduct experiments using the AVE [68] and the VGGSound [13] datasets in the appendix.
Dataset Splits Yes we randomly split them into training, validation, and testing sets with 840, 100, and 100 videos, respectively.
Hardware Specification Yes We train our proposed method and all baselines on a NVIDIA RTX A5000 GPU.
Software Dependencies No The paper mentions software like PyTorch [51], Detic [87], CLIP [56], and Video MAE [69], but it does not provide specific version numbers for these key software components as required for reproducibility.
Experiment Setup Yes In our proposed Cross-modal Similarity Distillation Constraint (Cross SDC), the balance weights λins and λcls are set to 0.1 and 0.3, respectively. And the balance weight λdist. for the output distillation loss is set to 0.3 in our experiments. For the memory set, we set the number of samples in each old class to 1, so as other baselines that involve the memory set.