Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval
Authors: Xiaobo Shen, Qianxin Huang, Long Lan, Yuhui Zheng
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results on three video benchmark datasets demonstrate that CTCH outperforms the state-of-the-arts in video-text retrieval tasks. |
| Researcher Affiliation | Academia | Xiaobo Shen1 , Qianxin Huang1 , Long Lan2 and Yuhui Zheng3 1Nanjing University of Science and Technology 2National University of Defense Technology 3Qinghai Normal University |
| Pseudocode | Yes | Algorithm 1 Video augmentation; Algorithm 2 Contrastive Transformer Cross-modal Hashing |
| Open Source Code | No | The paper does not include an explicit statement or link indicating that the source code for the proposed CTCH method is openly available. It mentions a 'pre-trained bidirectional transformer based hash model' and 'Hugging Face' for the text transformer, but this refers to third-party components rather than their own implementation. |
| Open Datasets | Yes | MSR-VTT [Xu et al., 2016] is the largest general video captioning dataset. ... Following [Xu et al., 2016], we randomly choose 6,513 and 2,990 clips for training and testing respectively. Activity Net Captions v1.2 [Krishna et al., 2017] is a large-scale video dataset... We randomly choose 4,816 and 2,382 videos for training and testing respectively. Charades [Sigurdsson et al., 2016] is a dataset... We choose 7985 and 1863 videos for training and testing respectively. |
| Dataset Splits | Yes | Following [Xu et al., 2016], we randomly choose 6,513 and 2,990 clips for training and testing respectively. ... We randomly choose 4,816 and 2,382 videos for training and testing respectively. ... We choose 7985 and 1863 videos for training and testing respectively. Since test set does not provide labels, we use validation set for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory specifications) used to run the experiments. It only mentions using VGG-16 for feature extraction and Adam optimizer for optimization, which are software/methods, not hardware. |
| Software Dependencies | No | The paper mentions software like VGG-16, Image Net, Hugging Face, and Adam optimizer but does not specify version numbers for these or other key software dependencies (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | The batch size, number of epochs, and learning rate are set to 256, 200, and 1e-4 respectively. The parameters α, β, and γ are set to 400, 0.05 and 0.05 respectively. The temperature coefficient is set to 0.2. The proposed CTCH is optimized using Adam optimizer. |