Weakly-Supervised Audio-Visual Segmentation
Authors: Shentong Mo, Bhiksha Raj
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on AVSBench demonstrate the effectiveness of our WS-AVS in the weakly-supervised audio-visual segmentation of single-source and multi-source scenarios. |
| Researcher Affiliation | Academia | Shentong Mo1,2 Bhiksha Raj1,2 1CMU, 2MBZUAI |
| Pseudocode | No | The paper describes its method using text and equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | AVSBench [1] contains 4,932 videos with 10,852 total frames from 23 categories including animals, humans, instruments, etc. Following prior work [1], we use the split of 3,452/740/740 videos for train/val/test in single source segmentation. |
| Dataset Splits | Yes | Following prior work [1], we use the split of 3,452/740/740 videos for train/val/test in single source segmentation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The model is trained with the Adam optimizer with default hyper-parameters β1 = 0.9, β2 = 0.999, and a learning rate of 1e-4. The model is trained for 20 epochs with a batch size of 64. |