Exploiting Visual Semantic Reasoning for Video-Text Retrieval

Authors: Zerun Feng, Zhimin Zeng, Caili Guo, Zheng Li

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two public benchmark datasets validate the effectiveness of our method by achieving state-of-the-art performance due to the powerful semantic reasoning.
Researcher Affiliation Academia 1Beijing Key Laboratory of Network System Architecture and Convergence, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China 2Beijing Laboratory of Advanced Information Networks, Beijing, China
Pseudocode No The paper describes its method using mathematical equations and textual explanations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing open-source code for the described methodology or a link to a code repository.
Open Datasets Yes We evaluate our model on two benchmark datasets: the MSRVTT dataset [Xu et al., 2016] and the MSVD dataset [Chen and Dolan, 2011].
Dataset Splits Yes We follow the identical partition strategy in [Dong et al., 2019] for training, testing and validation in our experiments.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions software components like 'Adam optimizer', 'Res Net-101', and 'Faster RCNN', but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We uniformly sample 16 frames from videos with the same time interval between every two frames. The number n of regions within a frame is 36, identical to [Anderson et al., 2018]. The dimension d of region features extracted from Res Net-101 is 2048. ... We set the word embedding size to 500 and the dimension of the common space D to 2048, similar to [Dong et al., 2019]. The margin parameter α is empirically chosen to be 0.2. The size of a mini-batch is 64. The optimizer in the training procedure is Adam with 50 epochs at most. We start training with an initial learning rate 0.0001, and the adjustment schedule is that once the validation loss does not decrease in three consecutive epochs, the learning rate is divided by 2.