reproducibilityindex.ai

Artemis: Towards Referential Understanding in Complex Videos

Authors: Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie, Tianren Ma, Pengyu Yan, DAVID DOERMANN, Qixiang Ye, Yunjie Tian

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results are promising both quantitatively and qualitatively. Additionally, we show that Artemis can be integrated with video grounding and text summarization tools to understand more complex scenarios. Code and data are available at https://github.com/qiujihao19/Artemis.
Researcher Affiliation	Academia	1University of Chinese Academy of Sciences 2University at Buffalo
Pseudocode	No	Not found. The paper describes the methodology in text and provides a framework diagram in Figure 2, but no explicit pseudocode or algorithm blocks are present.
Open Source Code	Yes	Code and data are available at https://github.com/qiujihao19/Artemis.
Open Datasets	Yes	We collect video data for referential understanding from 7 datasets, including HC-STVG [44], VIDSentence [10], A2D Sentences [20], La SOT [18], Me Vi S [16], GOT10K [24], and MGIT [23].
Dataset Splits	Yes	The validation portion, containing 3,400 video clips, evaluates Artemis’s ability.
Hardware Specification	Yes	This efficient design requires only 28 hours (3 hours for the final stage) on 8 NVIDIA-A800 GPUs.
Software Dependencies	No	The paper mentions specific models like Vicuna-7B v1.5 and CLIP ViT-L/14, and optimizer AdamW, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	The training procedure of Artemis comprises three steps, (1) video-text pre-training, (2) video-based instruction tuning, and (3) video-based referring. We report the detailed training hyper-parameters of Artemis in Table 6. Table 6 includes: Peak learning rate (1e-3, 2e-5, 4e-5), Lo RA rank (16), Image resolution (224), Global batch size (256, 128, 48), Numerical precision (bfloat16, float16), etc.