Negative Pre-aware for Noisy Cross-Modal Matching

Authors: Xu Zhang, Hao Li, Mang Ye

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method significantly improves matching accuracy and performance stability at increasing noise ratio. Our approach also surpasses the state-of-the-art methods by a large margin.
Researcher Affiliation Academia 1School of Computer Science and Engineering, University of Electronic Science and Technology of China 2School of Computer Science, Wuhan University
Pseudocode No The paper does not contain structured pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The code is available at: https://github.com/Zhang Xu0963/NPC.
Open Datasets Yes The proposed NPC is evaluated on three benchmark datasets, MSCOCO (Lin et al. 2014), Flickr30K (Young et al. 2014), and CC120K: We randomly sample a subset from the realworld dataset Conceptual Captions (Sharma et al. 2018).
Dataset Splits Yes MSCOCO contains 123,287 images with 5 annotated captions per image. Following previous works (Huang et al. 2021), we use 113,287 images for training, 5,000 images for validation, and 5,000 images for testing. Flickr30K contains 31,783 images with 5 annotated texts per image. Following previous works (Huang et al. 2021), we use 29,783 images for training, 1,000 images for validation, and 1,000 images for testing. CC120K contains 120, 851 with a single caption per image. In our experiment, we use 118,851 images for training, 1,000 images for validation, and 1,000 images for testing.
Hardware Specification Yes Both baseline and NPC are trained on a single RTX 3090 GPU optimized by Adam W (Loshchilov and Hutter 2019).
Software Dependencies No The paper mentions using CLIP and AdamW optimizer but does not specify software versions for programming languages or libraries (e.g., Python, PyTorch versions).
Experiment Setup Yes We start training CLIP and NPC with learning rates 5e-7 and 2e-7 with a weight decay of 0.2. In all experiments, we train the model for 5 epochs with a mini-batch size of 256, and the hyperparameter τ is set to 0.99.