Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing
Authors: Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Quantitative and qualitative results demonstrate that the proposed method performs favorably against existing methods on weakly-supervised audio-visual video parsing. We evaluate the proposed method on the LLP [4] dataset. |
| Researcher Affiliation | Collaboration | National Yang Ming Chiao Tung University UNC Chapel Hill UC Merced Snap Research Google Research Yonsei University |
| Pseudocode | No | The paper includes 'Figure 1: Algorithmic overview' which is a block diagram, but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models are publicly available. |
| Open Datasets | Yes | We use the Look, Listen and Parse (LLP) Dataset [4] for all experiments. |
| Dataset Splits | Yes | We use the 10000 video clips with only video-level event annotations for model training. The detailed annotations (e.g., individual audio and visual events per second) are available for the remaining 1849 validation and test videos. |
| Hardware Specification | Yes | We implement the proposed method using Py Torch [50], and conduct the training and evaluation processes on a single NVIDIA GTX 1080 Ti GPU with 11 GB memory. |
| Software Dependencies | No | The paper mentions 'Py Torch [50]' but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | No | The paper states 'Visual frames are sampled at 8 fps' and describes the feature extraction process, but it does not provide specific hyperparameters like learning rate, batch size, number of epochs, or optimizer settings. |