MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
Authors: Tieyuan Chen, Huabin Liu, Tianyao He, Yihang Chen, chaofan gan, Xiao Ma, Cheng Zhong, Yang Zhang, Yingxue Wang, Hui Lin, Weiyao Lin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments validate the effectiveness of our framework in providing causal relationships in multi-event videos, outperforming GPT-4o and Video LLa VA by 5.7% and 4.1%, respectively. |
| Researcher Affiliation | Collaboration | 1 Shanghai Jiao Tong University, 2 Lenovo Research, AI Lab, 3 China Academic of Electronics and Information Technology |
| Pseudocode | No | The paper provides model architecture diagrams and mathematical equations but does not include a pseudocode block or algorithm. |
| Open Source Code | Yes | https://github.com/tychen-SJTU/MECD-Benchmark |
| Open Datasets | Yes | The Activity Net Captions dataset [32] is built on Activity Net v1.3 which includes 20k 120-second You Tube untrimmed videos. ... We call this new dataset as MECD dataset, where 806 and 299 videos are randomly split for training and testing, respectively. [32] Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, and Juan Carlos Niebles. Densecaptioning events in videos. In Proceedings of the IEEE international conference on computer vision, pages 706 715, 2017. |
| Dataset Splits | No | The paper states that 806 videos are split for training and 299 for testing, but it does not explicitly mention a separate validation split or its size. While common, it is not explicitly provided in the text. |
| Hardware Specification | Yes | All the experiments are conducted on 1 NVIDIA A40 GPU. ... The inference speed experiments were conducted on 1 NVIDIA A6000 GPU. |
| Software Dependencies | No | The paper mentions using Bert Adam optimizer and building upon Videobert, as well as using GPT-4 API for data generation. However, it does not specify version numbers for these software components or other libraries used in the implementation. |
| Experiment Setup | Yes | We train our model for 20 epochs with a learning rate of 16e-5 about 6 hours. Our optimizer is consistent with Bert Adam [50] optimizer, with 3 epochs of warm-up. ... Hyperparameters λC, λR, λV , λS are set to be 1.0, 4.0, 0.25, 0.05. Maximum input lengths of the caption, the chain of thoughts, and the existence-only descriptions are set to 50. |