Video Token Merging for Long Video Understanding

Authors: Seon-Ho Lee, Jue Wang, Zhikang Zhang, David Fan, Xinyu Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show that we achieve better or comparable performances on the LVU, COIN, and Breakfast datasets.
Researcher Affiliation Collaboration Seon-Ho Lee Korea University seonholee@mcl.korea.ac.kr Jue Wang Amazon AGI juewangn@amazon.com Zhikang Zhang Amazon AGI zhikang@amazon.com David Fan Meta FAIR davidfan@meta.com Xinyu Li Amazon AGI xxnl@amazon.com
Pseudocode No The paper describes algorithmic steps and includes figures illustrating architectures (Figure 2, Figure 4, Figure 6), but it does not contain a dedicated pseudocode or algorithm block.
Open Source Code No The paper does not explicitly state that the code is publicly available, nor does it provide a link to a code repository. The NeurIPS checklist also indicates 'No' for open access to code.
Open Datasets Yes LVU (Wu & Krähenbühl, 2021): It contains 30K videos sampled from 3K movies on the Movie Clips (mov) website. Most videos are 1 to 3 minutes long... Breakfast (Kuehne et al., 2014): It provides 1,712 videos... COIN (Tang et al., 2019): It consists of 11,827 videos...
Dataset Splits No The paper uses standard datasets and mentions evaluation metrics, but it does not explicitly provide the specific training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification Yes For experiments, we use 8 Tesla V100 GPUs and Py Torch.
Software Dependencies No The paper mentions using 'Py Torch' but does not provide a specific version number for it or other software dependencies.
Experiment Setup Yes We use the Adam W (Loshchilov & Hutter, 2017) optimizer with a batch size of 16 and a weight decay of 0.01. We set the learning rate to 0.001. We train the network for 70 epochs by using cosine learning rate scheduler (Gotmare et al., 2018) with 10 epochs warm-up.