Binaural Audio-Visual Localization
Authors: Xinyi Wu, Zhenyao Wu, Lili Ju, Song Wang2961-2968
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Fair-Play and YT-Music datasets demonstrate the effectiveness of the proposed method and show that binaural audio can greatly improve the performance of localizing the sound sources, especially when the quality of the visual information is limited. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, University of South Carolina, USA 2Department of Mathematics, University of South Carolina, USA {xinyiw, zhenyao}@email.sc.edu, ju@math.sc.edu, songwang@cec.sc.edu |
| Pseudocode | No | The paper describes the network architecture and process but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | FAIR-Play (Gao and Grauman 2019a): FAIR-Play is the first audio-visual dataset recorded with both videos and professional binaural audios in a music room... YT-MUSIC (Morgado et al. 2018): The YT-MUSIC dataset is collected from Youtube for spatial audio generation by Morgado et al. (2018)... |
| Dataset Splits | No | The paper mentions using "train/test splits" and provides specific training and testing video counts for YT-MUSIC (250 for training and 67 for testing), but it does not explicitly define a separate validation split or its size. |
| Hardware Specification | Yes | BAVNet is implemented using Pytorch and trained with one Nvidia 2080Ti GPU. |
| Software Dependencies | No | The paper mentions that BAVNet is implemented using Pytorch but does not provide specific version numbers for Pytorch or any other software dependencies, which are required for a reproducible description. |
| Experiment Setup | Yes | We take Adam as the optimizer by setting weight decay to be 0.0001. The starting learning rate is set to 0.0001, then it decayed by multiplying it with the decay factor 0.8 for every 10 epochs. We train the network for 200 epochs in total with the batch size being 1. |