reproducibilityindex.ai

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Authors: Shentong Mo, Pedro Morgado

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using the new protocol, we conducted an extensive evaluation of prior methods, and found that most prior works are not capable of identifying negatives and suffer from significant overfitting problems (rely heavily on early stopping for best results). We also propose a new approach for visual sound source localization that addresses both these problems.
Researcher Affiliation	Academia	Shentong Mo Carnegie Mellon University Pedro Morgado University of Wisconsin-Madison
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code	Yes	Code and pre-trained models are available at https://github.com/stone Mo/SLAVC.
Open Datasets	Yes	We evaluate the effectiveness of the proposed method on two datasets Flickr Sound Net [1] and VGG Sound Sources [45].
Dataset Splits	No	The paper mentions using "a subset of 144k samples for training" and "extended test sets", and discusses validating the model for early stopping, but does not provide explicit details on the size or composition of a separate validation split.
Hardware Specification	No	Models are trained with a batch size of 128 on 2 GPUs for 20 epochs (which we found to be enough to achieve convergence in most cases).
Software Dependencies	No	Our implementation, available at https://github.com/stone Mo/SLAVC, is based on Py Torch [49] deep learning tool.
Experiment Setup	Yes	The visual encoder is initialized with Image Net [47] pre-trained weights [6, 9, 5]. The output dimensions of the audio and visual encoders (i.e., the output of projection functions g()) was kept at 512, the momentum encoders update factor at 0.999, and the visual dropout at 0.9. No audio dropout is applied. Models are trained with a batch size of 128 on 2 GPUs for 20 epochs... We used the Adam [48] optimizer with β1 = 0.9, β2 = 0.999, learning rate of 1e 4 and weight decay of 1e 4.