Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

Authors: Hailang Huang, Zhijie Nie, Ziqiao Wang, Ziyu Shang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results.
Researcher Affiliation Academia 1SKLSDE, School of Computer Science and Engineering, Beihang University, Beijing, China 2Shen Yuan Honors College, Beihang University, Beijing, China 3School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada 4School of Computer Science and Engineering, Southeast University, Nanjing, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code and supplementary files can be found at https://github.com/lerogo/aaai24 itr cusa.
Open Datasets Yes For image-text retrieval, we evaluate our approach on three datasets: Flickr30K (Young et al. 2014), MSCOCO (Lin et al. 2014), and ECCV Caption (Chun et al. 2022).
Dataset Splits No The paper mentions test sets for datasets like MSCOCO (5K Test Set) and Flickr30K (1K Test Set) and other test sets for image retrieval and STS tasks, but it does not provide specific training/validation split percentages or sample counts for these datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using 'Unicom-Vi T-B/32' and 'all-mpnet-base-v2' but does not specify exact version numbers for these or other software dependencies.
Experiment Setup Yes We use the above two losses, CSA and USA together, to adjust the original loss of the ITR model, so the overall loss function is expressed as: LCUSA = Loriginal + α LCSA + β LUSA. where α and β is the loss weight, which ranged from 0.1 to 1.0.