Continual Audio-Visual Sound Separation
Authors: Weiguo Pian, Yiyang Nan, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that Cont AV-Sep can effectively mitigate catastrophic forgetting and achieve significantly better performance compared to other continual learning baselines for audio-visual sound separation. Code is available at: https://github.com/weiguo Pian/Cont AV-Sep_Neur IPS2024. ... In this section, we first introduce the setup of our experiments, i.e., dataset, baselines, evaluation metrics, and the implementation details. |
| Researcher Affiliation | Academia | 1 The University of Texas at Dallas 2 Brown University 3 Carnegie Mellon University |
| Pseudocode | No | The paper describes its methods in text and uses mathematical formulations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/weiguo Pian/Cont AV-Sep_Neur IPS2024. |
| Open Datasets | Yes | Following common practice [83, 88, 14], we conducted experiments on MUSIC-21 [83]... To further validate the efficacy of our method across a broader sound domain, we conduct experiments using the AVE [68] and the VGGSound [13] datasets in the appendix. |
| Dataset Splits | Yes | we randomly split them into training, validation, and testing sets with 840, 100, and 100 videos, respectively. |
| Hardware Specification | Yes | We train our proposed method and all baselines on a NVIDIA RTX A5000 GPU. |
| Software Dependencies | No | The paper mentions software like PyTorch [51], Detic [87], CLIP [56], and Video MAE [69], but it does not provide specific version numbers for these key software components as required for reproducibility. |
| Experiment Setup | Yes | In our proposed Cross-modal Similarity Distillation Constraint (Cross SDC), the balance weights λins and λcls are set to 0.1 and 0.3, respectively. And the balance weight λdist. for the output distillation loss is set to 0.3 in our experiments. For the memory set, we set the number of samples in each old class to 1, so as other baselines that involve the memory set. |