Self-Chained Image-Language Model for Video Localization and Question Answering
Authors: Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of Se Vi LA framework on five challenging video question answering and event prediction benchmarks (NEx T-QA, STAR, How2QA, TVQA, and VLEP) [75, 77, 36, 27, 28], where Se Vi LA outperforms several strong baselines/previous works, and achieves the stateof-the-art in both fine-tuning (NEx T-QA and STAR) and zero-shot (NEx T-QA, STAR, How2QA, and VLEP) settings. |
| Researcher Affiliation | Academia | UNC Chapel Hill {shoubin, jmincho, praty, mbansal}@cs.unc.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and checkpoints are available at: https://github.com/Yui010206/Se Vi LA |
| Open Datasets | Yes | Benchmarks. We evaluate our Se Vi LA framework on 3 video-language tasks, including multi-choice Video Question Answering (NEx T-QA [77], STAR [75], How2QA [36], TVQA [27]), Video Event Prediction (VLEP [28]), and Moment Retrieval (QVHighlights [30]). |
| Dataset Splits | Yes | For NEx T-QA, STAR, How2QA, TVQA, and VLEP we report the performance on the validation set whereas QVHighlights we report on the hidden test set. |
| Hardware Specification | Yes | We conduct experiments with 4 48GB A6000 GPUs. For Localizer per-training, we pre-train Localizer on the QVHighlights for 80 epochs, taking approximately 12 hours with 4 29GB on A6000 GPUs. |
| Software Dependencies | No | The paper mentions software like PyTorch, Huggingface Transformers, and Torchvision but does not specify their version numbers. |
| Experiment Setup | Yes | We report Se Vi LA framework training hyperparameters in Localizer pre-training, Answerer fine-tuning, and Localizer self-refrinment, in Table 11. |