reproducibilityindex.ai

Binaural Audio-Visual Localization

Authors: Xinyi Wu, Zhenyao Wu, Lili Ju, Song Wang2961-2968

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on Fair-Play and YT-Music datasets demonstrate the effectiveness of the proposed method and show that binaural audio can greatly improve the performance of localizing the sound sources, especially when the quality of the visual information is limited.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, University of South Carolina, USA 2Department of Mathematics, University of South Carolina, USA {xinyiw, zhenyao}@email.sc.edu, ju@math.sc.edu, songwang@cec.sc.edu
Pseudocode	No	The paper describes the network architecture and process but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	FAIR-Play (Gao and Grauman 2019a): FAIR-Play is the first audio-visual dataset recorded with both videos and professional binaural audios in a music room... YT-MUSIC (Morgado et al. 2018): The YT-MUSIC dataset is collected from Youtube for spatial audio generation by Morgado et al. (2018)...
Dataset Splits	No	The paper mentions using "train/test splits" and provides specific training and testing video counts for YT-MUSIC (250 for training and 67 for testing), but it does not explicitly define a separate validation split or its size.
Hardware Specification	Yes	BAVNet is implemented using Pytorch and trained with one Nvidia 2080Ti GPU.
Software Dependencies	No	The paper mentions that BAVNet is implemented using Pytorch but does not provide specific version numbers for Pytorch or any other software dependencies, which are required for a reproducible description.
Experiment Setup	Yes	We take Adam as the optimizer by setting weight decay to be 0.0001. The starting learning rate is set to 0.0001, then it decayed by multiplying it with the decay factor 0.8 for every 10 epochs. We train the network for 200 epochs in total with the batch size being 1.