reproducibilityindex.ai

Enhancing Cross-Modal Retrieval via Visual-Textual Prompt Hashing

Authors: Bingzhi Chen, Zhongqi Wu, Yishu Liu, Biqing Zeng, Guangming Lu, Zheng Zhang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple benchmark datasets consistently demonstrate the superiority and robustness of our VTPH method over state-of-the-art competitors.
Researcher Affiliation	Academia	1Beijing Institute of Technology, Zhuhai, China 2South China Normal University, Guangzhou, China 3Harbin Institute of Technology, Shenzhen, China
Pseudocode	No	No explicit pseudocode or algorithm blocks are present in the paper.
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets	Yes	Three commonly used multi-label image-text cross-modal datasets, i.e., MIRFLICKR-25K [Huiskes and Lew, 2008], NUS-WIDE [Huiskes and Lew, 2008], and MS-COCO [Lin et al., 2014], are selected for our experiments.
Dataset Splits	No	Our settings follow the data splitting protocol used in [Tu et al., 2022][Liu et al., 2023b], which are shown in the supplementary document for details. The paper refers to a supplementary document for details, but these details are not present in the provided text.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Vision Transformer (VIT) [Dosovitskiy et al., 2021]' and 'GPT-2 [Radford et al., 2019]' as image and text encoders, and 'm PLUG [Li et al., 2022]', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Specifically, we focus on analyzing the effects of three hyper-parameters, i.e., α, β, and γ, as shown in Eqn. (9) and Eqn. (16). Through careful experimentation and analysis in Figure 4, it can be observed that our VTPH method achieves the best performance when α = 10, β = 8, and γ = 500, respectively.