Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Authors: Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive experiments demonstrate that this new paradigm outperforms other fusion-based methods in both the unseen class and cross-dataset settings. and To evaluate the grounding performance of our model, we conduct tests on AVS-Benchmarks and use mean intersection over union (m Io U) and F-score as the performance metrics, following previous works (Zhou et al. 2022b; Gao et al. 2023). Additionally, to assess the generalization ability, we split zero-shot and few-shot testing subsets2 based on AVSBenchmarks and VGG-SS datasets. |
| Researcher Affiliation | Academia | 1 Gaoling School of Artificial Intelligence, Renmin University of China 2 School of Computer Science, Northwest Polytechnical University 3 LIESMARS, Wuhan University 4 College of Computer Science and Technology, Zhejiang Universityyaoting.wang@outlook.com, liuweisong@mail.nwpu.edu.cn, guangyaoli@ruc.edu.cn jian.ding@whu.edu.cn, dihu@ruc.edu.cn, xilizju@zju.edu.cn |
| Pseudocode | No | The paper describes the method using mathematical equations and figures, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Project page: https://github.com/Ge Wu-Lab/Generalizable Audio-Visual-Segmentation |
| Open Datasets | Yes | To evaluate the grounding performance of our model, we conduct tests on AVS-Benchmarks and use mean intersection over union (m Io U) and F-score as the performance metrics, following previous works (Zhou et al. 2022b; Gao et al. 2023). Additionally, to assess the generalization ability, we split zero-shot and few-shot testing subsets2 based on AVSBenchmarks and VGG-SS datasets. and AVS-Benchmarks (Zhou et al. 2022b) is a dataset specifically designed for AVS tasks. and VGG-SS (Chen et al. 2021) is a dataset designed for the AVL task performance test. |
| Dataset Splits | No | To evaluate the grounding performance of our model, we conduct tests on AVS-Benchmarks and use mean intersection over union (m Io U) and F-score as the performance metrics, following previous works (Zhou et al. 2022b; Gao et al. 2023). Additionally, to assess the generalization ability, we split zero-shot and few-shot testing subsets2 based on AVSBenchmarks and VGG-SS datasets. and Refer to the project page for detailed split settings. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing specifications). |
| Software Dependencies | No | The paper mentions models like VGGish and SAM, and techniques like contrastive learning and Fourier transform, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper describes the model architecture and learning objectives but does not explicitly state specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations for reproduction. |