Improving Audio-Visual Segmentation with Bidirectional Generation

Authors: Dawei Hao, Yuxin Mao, Bowen He, Xiaodong Han, Yuchao Dai, Yiran Zhong

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To showcase the effectiveness of our approach, we conduct comprehensive experiments and analyses on the widely recognized AVSBench benchmark.
Researcher Affiliation Collaboration 1Bilibili Inc., Shanghai, China 2Open NLPLab, Shanghai AI Lab, Shanghai, China 3Northwestern Polytechnical University, Shaanxi, China 4NIO, Shanghai, China
Pseudocode No No pseudocode or algorithm blocks are provided in the paper.
Open Source Code Yes Code is released in: https://github.com/Open NLPLab/AVS-bidirectional.
Open Datasets Yes We conduct training and evaluation experiments on the AVSBench (Zhou et al. 2022) dataset.
Dataset Splits No The paper describes training and evaluation settings but does not explicitly detail validation dataset splits with proportions or sample counts.
Hardware Specification Yes We train our model using Py Torch on an NVIDIA Tesla V100
Software Dependencies No We train our model using Py Torch (no version specified, no other software dependencies with versions are listed).
Experiment Setup Yes We train our model using Py Torch on an NVIDIA Tesla V100 and utilize the Adam optimizer with a learning rate of 10 4. The batch size is set to 8, and we train on the Single-source subset for 15 epochs and the Multi-sources subset for 30 epochs. We resize all video frames to 224 224.