reproducibilityindex.ai

Slot-VLM: Object-Event Slots for Video-Language Modeling

Authors: Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate the effectiveness of our Slot-VLM, which achieves the state-of-the-art performance on video question-answering2.
Researcher Affiliation	Collaboration	Jiaqi Xu1 , Cuiling Lan2, Wenxuan Xie2, Xuejin Chen1, Yan Lu2 1University of Science and Technology of China, 2Microsoft Research Asia xujiaqi@mail.ustc.edu.cn, {culan,wenxie,yanlu}@microsoft.com, xjchen99@ustc.edu.cn
Pseudocode	No	The paper describes its methodology in natural language and with diagrams, but does not include any formal pseudocode or algorithm blocks.
Open Source Code	No	This paper is the result of an open source research project starting from October, 2023. [...] We will release the code.
Open Datasets	Yes	We use the Video Instruction Data, collected by [28], for video instruction tuning. [...] We evaluate the performance on three open-ended video question-answering (QA) benchmarks like MSVD-QA[9], MSRVTT-QA[44], and Activity Net-QA [7].
Dataset Splits	No	The paper mentions "instruction tuning" and evaluation on "test set" but does not explicitly describe a separate validation dataset split with specific percentages or counts.
Hardware Specification	Yes	All models are trained using a single NVIDIA A100 80GB GPU.
Software Dependencies	No	The paper mentions "Adam W" as an optimizer but does not specify programming languages, libraries, or other software dependencies with version numbers.
Experiment Setup	Yes	In our experiments, we set No and Ne to 8 by default unless otherwise specified. [...] The linear projection layer S-Proj., F-Proj. and Proj. consists of 1024, 1024, and 4096 neurons, respectively. [...] We train the models for 60 epochs with a learning rate 1e-4. [...] We set the learning rate to 2e-5. We adopt the cosine annealing learning rate. We set the batch size to 40 and train on a single A100 GPU.