reproducibilityindex.ai

MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

Authors: Tieyuan Chen, Huabin Liu, Tianyao He, Yihang Chen, chaofan gan, Xiao Ma, Cheng Zhong, Yang Zhang, Yingxue Wang, Hui Lin, Weiyao Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments validate the effectiveness of our framework in providing causal relationships in multi-event videos, outperforming GPT-4o and Video LLa VA by 5.7% and 4.1%, respectively.
Researcher Affiliation	Collaboration	1 Shanghai Jiao Tong University, 2 Lenovo Research, AI Lab, 3 China Academic of Electronics and Information Technology
Pseudocode	No	The paper provides model architecture diagrams and mathematical equations but does not include a pseudocode block or algorithm.
Open Source Code	Yes	https://github.com/tychen-SJTU/MECD-Benchmark
Open Datasets	Yes	The Activity Net Captions dataset [32] is built on Activity Net v1.3 which includes 20k 120-second You Tube untrimmed videos. ... We call this new dataset as MECD dataset, where 806 and 299 videos are randomly split for training and testing, respectively. [32] Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, and Juan Carlos Niebles. Densecaptioning events in videos. In Proceedings of the IEEE international conference on computer vision, pages 706 715, 2017.
Dataset Splits	No	The paper states that 806 videos are split for training and 299 for testing, but it does not explicitly mention a separate validation split or its size. While common, it is not explicitly provided in the text.
Hardware Specification	Yes	All the experiments are conducted on 1 NVIDIA A40 GPU. ... The inference speed experiments were conducted on 1 NVIDIA A6000 GPU.
Software Dependencies	No	The paper mentions using Bert Adam optimizer and building upon Videobert, as well as using GPT-4 API for data generation. However, it does not specify version numbers for these software components or other libraries used in the implementation.
Experiment Setup	Yes	We train our model for 20 epochs with a learning rate of 16e-5 about 6 hours. Our optimizer is consistent with Bert Adam [50] optimizer, with 3 epochs of warm-up. ... Hyperparameters λC, λR, λV , λS are set to be 1.0, 4.0, 0.25, 0.05. Maximum input lengths of the caption, the chain of thoughts, and the existence-only descriptions are set to 50.