Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
Authors: Shentong Mo, Yapeng Tian
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the LLP [3] dataset validate that our new audio-visual video parsing framework achieves superior results over previous state-of-the-art methods [1, 2, 3, 4]. Empirical results also demonstrate the generalizability of our approach to contrastive learning and label refinement proposed in MA [4]. |
| Researcher Affiliation | Academia | Shentong Mo Carnegie Mellon University Yapeng Tian University of Texas at Dallas |
| Pseudocode | No | The paper does not contain an explicitly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Code is available at https://github.com/stone Mo/MGN. |
| Open Datasets | Yes | The Look, Listen and Parse (LLP) Dataset [3] contains 11,849 You Tube video clips of 10-seconds long from 25 different event categories, such as car, music, cheering, speech, etc. |
| Dataset Splits | Yes | We use 10,000 video clips with only video-level event labels for training. Following the official splits [3] of validation and test sets, we develop and test the model on the remaining 1879 videos with the segment-level annotations, i.e., the speech event for audio starts at 1s and ends at 5s. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU model, CPU type). |
| Software Dependencies | No | The paper mentions software components like ResNet-152, 3D ResNet, VGGish, and Adam optimizer, but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The model is trained with Adam [41] optimizer with β1=0.9, β2=0.999 and with an initial learning rate of 3e-4. We train the model with a batch size of 16 for 40 epochs. |