Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval

Authors: Xiaobo Shen, Qianxin Huang, Long Lan, Yuhui Zheng

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on three video benchmark datasets demonstrate that CTCH outperforms the state-of-the-arts in video-text retrieval tasks.
Researcher Affiliation Academia Xiaobo Shen1 , Qianxin Huang1 , Long Lan2 and Yuhui Zheng3 1Nanjing University of Science and Technology 2National University of Defense Technology 3Qinghai Normal University
Pseudocode Yes Algorithm 1 Video augmentation; Algorithm 2 Contrastive Transformer Cross-modal Hashing
Open Source Code No The paper does not include an explicit statement or link indicating that the source code for the proposed CTCH method is openly available. It mentions a 'pre-trained bidirectional transformer based hash model' and 'Hugging Face' for the text transformer, but this refers to third-party components rather than their own implementation.
Open Datasets Yes MSR-VTT [Xu et al., 2016] is the largest general video captioning dataset. ... Following [Xu et al., 2016], we randomly choose 6,513 and 2,990 clips for training and testing respectively. Activity Net Captions v1.2 [Krishna et al., 2017] is a large-scale video dataset... We randomly choose 4,816 and 2,382 videos for training and testing respectively. Charades [Sigurdsson et al., 2016] is a dataset... We choose 7985 and 1863 videos for training and testing respectively.
Dataset Splits Yes Following [Xu et al., 2016], we randomly choose 6,513 and 2,990 clips for training and testing respectively. ... We randomly choose 4,816 and 2,382 videos for training and testing respectively. ... We choose 7985 and 1863 videos for training and testing respectively. Since test set does not provide labels, we use validation set for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory specifications) used to run the experiments. It only mentions using VGG-16 for feature extraction and Adam optimizer for optimization, which are software/methods, not hardware.
Software Dependencies No The paper mentions software like VGG-16, Image Net, Hugging Face, and Adam optimizer but does not specify version numbers for these or other key software dependencies (e.g., Python, PyTorch versions).
Experiment Setup Yes The batch size, number of epochs, and learning rate are set to 256, 200, and 1e-4 respectively. The parameters α, β, and γ are set to 400, 0.05 and 0.05 respectively. The temperature coefficient is set to 0.2. The proposed CTCH is optimized using Adam optimizer.