Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
Authors: Tieyuan Chen, Huabin Liu, Tianyao He, Yihang Chen, chaofan gan, Xiao Ma, Cheng Zhong, Yang Zhang, Yingxue Wang, Hui Lin, Weiyao Lin
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments validate the effectiveness of our framework in providing causal relationships in multi-event videos, outperforming GPT-4o and Video LLa VA by 5.7% and 4.1%, respectively. |
| Researcher Affiliation | Collaboration | 1 Shanghai Jiao Tong University, 2 Lenovo Research, AI Lab, 3 China Academic of Electronics and Information Technology |
| Pseudocode | No | The paper provides model architecture diagrams and mathematical equations but does not include a pseudocode block or algorithm. |
| Open Source Code | Yes | https://github.com/tychen-SJTU/MECD-Benchmark |
| Open Datasets | Yes | The Activity Net Captions dataset [32] is built on Activity Net v1.3 which includes 20k 120-second You Tube untrimmed videos. ... We call this new dataset as MECD dataset, where 806 and 299 videos are randomly split for training and testing, respectively. [32] Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, and Juan Carlos Niebles. Densecaptioning events in videos. In Proceedings of the IEEE international conference on computer vision, pages 706 715, 2017. |
| Dataset Splits | No | The paper states that 806 videos are split for training and 299 for testing, but it does not explicitly mention a separate validation split or its size. While common, it is not explicitly provided in the text. |
| Hardware Specification | Yes | All the experiments are conducted on 1 NVIDIA A40 GPU. ... The inference speed experiments were conducted on 1 NVIDIA A6000 GPU. |
| Software Dependencies | No | The paper mentions using Bert Adam optimizer and building upon Videobert, as well as using GPT-4 API for data generation. However, it does not specify version numbers for these software components or other libraries used in the implementation. |
| Experiment Setup | Yes | We train our model for 20 epochs with a learning rate of 16e-5 about 6 hours. Our optimizer is consistent with Bert Adam [50] optimizer, with 3 epochs of warm-up. ... Hyperparameters λC, λR, λV , λS are set to be 1.0, 4.0, 0.25, 0.05. Maximum input lengths of the caption, the chain of thoughts, and the existence-only descriptions are set to 50. |