Enhancing Cross-Modal Retrieval via Visual-Textual Prompt Hashing
Authors: Bingzhi Chen, Zhongqi Wu, Yishu Liu, Biqing Zeng, Guangming Lu, Zheng Zhang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple benchmark datasets consistently demonstrate the superiority and robustness of our VTPH method over state-of-the-art competitors. |
| Researcher Affiliation | Academia | 1Beijing Institute of Technology, Zhuhai, China 2South China Normal University, Guangzhou, China 3Harbin Institute of Technology, Shenzhen, China |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | Yes | Three commonly used multi-label image-text cross-modal datasets, i.e., MIRFLICKR-25K [Huiskes and Lew, 2008], NUS-WIDE [Huiskes and Lew, 2008], and MS-COCO [Lin et al., 2014], are selected for our experiments. |
| Dataset Splits | No | Our settings follow the data splitting protocol used in [Tu et al., 2022][Liu et al., 2023b], which are shown in the supplementary document for details. The paper refers to a supplementary document for details, but these details are not present in the provided text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Vision Transformer (VIT) [Dosovitskiy et al., 2021]' and 'GPT-2 [Radford et al., 2019]' as image and text encoders, and 'm PLUG [Li et al., 2022]', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Specifically, we focus on analyzing the effects of three hyper-parameters, i.e., α, β, and γ, as shown in Eqn. (9) and Eqn. (16). Through careful experimentation and analysis in Figure 4, it can be observed that our VTPH method achieves the best performance when α = 10, β = 8, and γ = 500, respectively. |