reproducibilityindex.ai

Glance and Focus: Memory Prompting for Multi-Event Video Question Answering

Authors: Ziyi Bai, Ruiping Wang, Xilin Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on four Multi-Event Video QA benchmarks including STAR, Ego Task QA, AGQA, and NEx T-QA. Our proposed model achieves state-of-the-art results, surpassing current large models in various challenging reasoning tasks.
Researcher Affiliation	Academia	1Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China 2University of Chinese Academy of Sciences, Beijing, 100049, China
Pseudocode	No	The paper does not contain a pseudocode block or a clearly labeled algorithm block.
Open Source Code	Yes	The code and models are available at https://github.com/ByZ0e/Glance-Focus.
Open Datasets	Yes	We conduct extensive experiments on four Multi-Event Video QA benchmarks including STAR, Ego Task QA, AGQA, and NEx T-QA.
Dataset Splits	Yes	For each benchmark, we follow standard protocols outlined by prior works for data processing, metrics, and settings to ensure fair comparisons.
Hardware Specification	Yes	All experiments are conducted on an NVIDIA Ge Force RTX 3090Ti GPU.
Software Dependencies	No	The paper mentions software components like S3D, C3D, Faster-RCNN, CLIP, Transformer, RoBERTa, and Adam optimizer, but does not provide specific version numbers for these or other ancillary software dependencies.
Experiment Setup	Yes	We employ a standard 2-layer, 8-head Transformer Encoder-Decoder with hidden size D of 512 as the backbone for our Glance-Focus model. ... For training details, we use dropout of 0.1, and initialize model weights using Xavier init[13]. Adam optimizer[20] is used with a learning rate of 5e-6 to optimize model parameters.