Improving Audio-Visual Segmentation with Bidirectional Generation
Authors: Dawei Hao, Yuxin Mao, Bowen He, Xiaodong Han, Yuchao Dai, Yiran Zhong
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To showcase the effectiveness of our approach, we conduct comprehensive experiments and analyses on the widely recognized AVSBench benchmark. |
| Researcher Affiliation | Collaboration | 1Bilibili Inc., Shanghai, China 2Open NLPLab, Shanghai AI Lab, Shanghai, China 3Northwestern Polytechnical University, Shaanxi, China 4NIO, Shanghai, China |
| Pseudocode | No | No pseudocode or algorithm blocks are provided in the paper. |
| Open Source Code | Yes | Code is released in: https://github.com/Open NLPLab/AVS-bidirectional. |
| Open Datasets | Yes | We conduct training and evaluation experiments on the AVSBench (Zhou et al. 2022) dataset. |
| Dataset Splits | No | The paper describes training and evaluation settings but does not explicitly detail validation dataset splits with proportions or sample counts. |
| Hardware Specification | Yes | We train our model using Py Torch on an NVIDIA Tesla V100 |
| Software Dependencies | No | We train our model using Py Torch (no version specified, no other software dependencies with versions are listed). |
| Experiment Setup | Yes | We train our model using Py Torch on an NVIDIA Tesla V100 and utilize the Adam optimizer with a learning rate of 10 4. The batch size is set to 8, and we train on the Single-source subset for 15 epochs and the Multi-sources subset for 30 epochs. We resize all video frames to 224 224. |