reproducibilityindex.ai

Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Authors: Yingying Fan, Yu Wu, Bo Du, Yutian Lin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our simple yet effective approach outperforms state-of-the-art methods by a large margin. Code and data are available at https://github.com/fyyCS/LSLD. ... Section 4 Experiments
Researcher Affiliation	Academia	Yingying Fan, Yu Wu, Bo Du, Yutian Lin School of Computer Science, Hubei Luojia Laboratory, Wuhan University {fanyingying_cs, wuyucs, dubo, yutian.lin}@whu.edu.cn
Pseudocode	No	The paper includes 'Figure 1: Algorithm Overview' which is a block diagram, but it does not present structured pseudocode or a formal algorithm block with detailed steps.
Open Source Code	Yes	Code and data are available at https://github.com/fyyCS/LSLD.
Open Datasets	Yes	In the AVVP task, we only evaluate our method on the Look, Listen and Parse (LLP) Dataset [4] following previous AVVP work. ... For the training process, we use 10,000 video clips with only video-level event labels.
Dataset Splits	Yes	The remaining 1849 validation and test videos possess modality and segment-specific labels (i.e., start and end time of each event on audio and visual track). We conduct experiments following the official data splits from the LLP dataset.
Hardware Specification	Yes	We conduct the training and evaluation processes on a single NVIDIA GTX 2080 Ti GPU with 11 GB memory.
Software Dependencies	No	The paper mentions various models and tools used (e.g., CLIP, CLAP, Resnet, VGGish, Adam optimizer) but does not provide specific version numbers for any of these software components or libraries, which would be necessary for full reproducibility.
Experiment Setup	Yes	Following HAN [4] we adopt the Adam optimizer and the learning rate 2e-4 drops by a factor of 0.25 for every 6 epochs. We train the model with a batch size of 32 for 20 epochs. ... α is set to 4 and β is 0.4.