Panoramic Video Salient Object Detection with Ambisonic Audio Guidance
Authors: Xiang Li, Haoyuan Cao, Shijie Zhao, Junlin Li, Li Zhang, Bhiksha Raj
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results verify the effectiveness of our proposed components and demonstrate that our method achieves state-of-the-art performance on the ASOD60K dataset. |
| Researcher Affiliation | Collaboration | 1 Carnegie Mellon University, PA, USA. 2 Bytedance Inc., San Diego, CA, USA. 3 Bytedance Inc., Shenzhen, China. 4 Mohammed bin Zayed University of AI, Abu Dhabi, UAE. |
| Pseudocode | No | The provided text does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on the ASOD60K dataset (Zhang, Chao, and Zhang 2021) |
| Dataset Splits | No | The test set of ASOD60K contains three subsets split by sound event classes miscellanea, music, and speaking. While a test set is mentioned, the paper does not provide specific details on the train/validation splits (percentages, counts, or explicit standard split names). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types). |
| Software Dependencies | No | Our method is implemented with Py Torch. This statement mentions a software framework but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The model is trained for 20 epochs with a learning rate of 1e-4. We adopt a batchsize of 2 and an Adam W (Loshchilov and Hutter 2017) optimizer with weight decay 0. All images are cropped to have the longest side of 832 pixels and the shortest side of 416 pixels during training and evaluation. The window size is set to 3. The λdistill is set to 5.0 and λdice is set to 1 if no specification. |