reproducibilityindex.ai

Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

Authors: Hailang Huang, Zhijie Nie, Ziqiao Wang, Ziyu Shang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results.
Researcher Affiliation	Academia	1SKLSDE, School of Computer Science and Engineering, Beihang University, Beijing, China 2Shen Yuan Honors College, Beihang University, Beijing, China 3School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada 4School of Computer Science and Engineering, Southeast University, Nanjing, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code and supplementary ﬁles can be found at https://github.com/lerogo/aaai24 itr cusa.
Open Datasets	Yes	For image-text retrieval, we evaluate our approach on three datasets: Flickr30K (Young et al. 2014), MSCOCO (Lin et al. 2014), and ECCV Caption (Chun et al. 2022).
Dataset Splits	No	The paper mentions test sets for datasets like MSCOCO (5K Test Set) and Flickr30K (1K Test Set) and other test sets for image retrieval and STS tasks, but it does not provide specific training/validation split percentages or sample counts for these datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Unicom-Vi T-B/32' and 'all-mpnet-base-v2' but does not specify exact version numbers for these or other software dependencies.
Experiment Setup	Yes	We use the above two losses, CSA and USA together, to adjust the original loss of the ITR model, so the overall loss function is expressed as: LCUSA = Loriginal + α LCSA + β LUSA. where α and β is the loss weight, which ranged from 0.1 to 1.0.