Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing

Authors: Shentong Mo, Yapeng Tian

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the LLP [3] dataset validate that our new audio-visual video parsing framework achieves superior results over previous state-of-the-art methods [1, 2, 3, 4]. Empirical results also demonstrate the generalizability of our approach to contrastive learning and label refinement proposed in MA [4].
Researcher Affiliation	Academia	Shentong Mo Carnegie Mellon University Yapeng Tian University of Texas at Dallas
Pseudocode	No	The paper does not contain an explicitly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code is available at https://github.com/stone Mo/MGN.
Open Datasets	Yes	The Look, Listen and Parse (LLP) Dataset [3] contains 11,849 You Tube video clips of 10-seconds long from 25 different event categories, such as car, music, cheering, speech, etc.
Dataset Splits	Yes	We use 10,000 video clips with only video-level event labels for training. Following the official splits [3] of validation and test sets, we develop and test the model on the remaining 1879 videos with the segment-level annotations, i.e., the speech event for audio starts at 1s and ends at 5s.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU model, CPU type).
Software Dependencies	No	The paper mentions software components like ResNet-152, 3D ResNet, VGGish, and Adam optimizer, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	The model is trained with Adam [41] optimizer with β1=0.9, β2=0.999 and with an initial learning rate of 3e-4. We train the model with a batch size of 16 for 40 epochs.