Learning with Noisy Correspondence for Cross-modal Matching

Authors: Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, Xi Peng

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of our method, we conduct experiments by using the image-text matching as a showcase. Extensive experiments on Flickr30K, MS-COCO, and Conceptual Captions verify the effectiveness of our method.
Researcher Affiliation Collaboration Zhenyu Huang College of Computer Science Sichuan University, China zyhuang.gm@gmail.com Guocheng Niu Baidu Inc., China niuguocheng@baidu.com Xiao Liu TAL Education Group liuxiao15@tal.com Wenbiao Ding TAL Education Group dingwenbiao@tal.com Xinyan Xiao Baidu Inc., China xiaoxinyan@baidu.com Hua Wu Baidu Inc., China wu_hua@baidu.com Xi Peng College of Computer Science Sichuan University, China pengx.gm@gmail.com
Pseudocode Yes Algorithm 1: Noisy Correspondence Rectifier
Open Source Code No The code could be accessed from www.pengxi.me. This URL is a general personal website and does not explicitly state that it contains the source code for the methodology or experiments described in the paper, nor is it a direct link to a code repository.
Open Datasets Yes In the experiments, we use three benchmark datasets including Flickr30K [42], MS-COCO [23], and Conceptual Captions [35].
Dataset Splits Yes Flickr30K contains 31,000 images collected from the Flickr website with five captions each. Following [19], we use 1,000 images for validation, 1,000 images for testing, and the rest for training. MS-COCO contains 123,287 images with five captions each. We follow the data partition in [19] which consists of 113,287 training images, 5,000 validation images, and 5,000 test images. In our experiments, we use a subset of Conceptual Captions for evaluation, named CC152K. Specifically, we randomly select 150,000 samples from the training split for training, 1,000 samples from the validation split for validation, and 1,000 samples from the validation split for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Adam optimizer [16]' but does not provide specific version numbers for other key software components (e.g., Python, PyTorch, TensorFlow, CUDA) used for their own implementation.
Experiment Setup Yes We train our network using the Adam optimizer [16] with the default parameters and a batch size of 128. In addition, we fix the margin α = 0.2 and m = 10 for the soft margin through the experiments.