Cross-modal Active Complementary Learning with Self-refining Correspondence
Authors: Yang Qin, Yuan Sun, Dezhong Peng, Joey Tianyi Zhou, Xi Peng, Peng Hu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We carry out extensive experiments on three image-text benchmarks, i.e., Flickr30K, MS-COCO, and CC152K, to verify the superior robustness of our CRCL against synthetic and real-world noisy correspondences. |
| Researcher Affiliation | Collaboration | 1 College of Computer Science, Sichuan University, Chengdu, China. 2 Centre for Frontier AI Research (CFAR) and Institute of High Performance Computing (IHPC), A*STAR, Singapore. 3 Chengdu Ruibei Yingte Information Technology Co., Ltd, Chengdu, China. 4 Sichuan Zhiqian Technology Co., Ltd, Chengdu, China. |
| Pseudocode | Yes | Algorithm 1: The pseudo-code of CRCL |
| Open Source Code | Yes | Code is available at https://github.com/Qin Yang79/CRCL. |
| Open Datasets | Yes | For an extensive evaluation, we use three benchmark datasets (i.e., Flickr30K [34], MSCOCO [35] and CC152K [12]) in our experiments. |
| Dataset Splits | Yes | Following [36], 30,000 images are employed for training, 1,000 images for validation, and 1,000 images for testing in our experiments. MS-COCO is a large-scale image-text dataset, which has 123,287 images, and 5 captions are given to describe each image. We follow the split of [36, 8] to carry out our experiments, i.e., 5000 validation images, 5000 test images, and the rest for training. CC152K contains 150,000 image-text pairs for training, 1,000 pairs for validation, and 1,000 pairs for testing. |
| Hardware Specification | No | No: The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments. |
| Software Dependencies | No | No: The paper mentions using 'BUTD features' and 'Bi-GRU' as textual backbone, but does not provide specific version numbers for any software libraries, frameworks, or dependencies. |
| Experiment Setup | Yes | Specifically, the shared hyper-parameters are set as the same as the original works [4, 9], e.g., the batch size is 128, the word embedding size is 300, and the joint embedding dimensionality is 1,024. More specific hyper-parameters and implementation details are given in our supplementary material due to the space limitation. |