Sentence-level Prompts Benefit Composed Image Retrieval
Authors: Yang bai, Xinxing Xu, Yong Liu, Salman Khan, Fahad Khan, Wangmeng Zuo, Rick Siow Mong Goh, Chun-Mei Feng
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our proposed method performs favorably against the state-of-the-art CIR methods on the Fashion-IQ and CIRR datasets. |
| Researcher Affiliation | Collaboration | Yang Bai1 Xinxing Xu1 Yong Liu1 Salman Khan2,3 Fahad Khan2 Wangmeng Zuo4 Rick Siow Mong Goh1 Chun-Mei Feng1 1Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore 2Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE 3Australian National University, Canberra ACT, Australia 4Harbin Institute of Technology, Harbin, China |
| Pseudocode | No | The paper does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | fengcm.ai@gmail.com https://github.com/chunmeifeng/SPRC |
| Open Datasets | Yes | We evaluate our method on two CIR benchmarks: (1) Fashion-IQ a fashion dataset with 77, 684 images forming 30, 134 triplets (Wu et al., 2021). ... (2) CIRR is a general image dataset that comprises 36, 554 triplets derived from 21, 552 images from the popular natural language inference dataset NLVR2 (Suhr et al., 2018). |
| Dataset Splits | Yes | We randomly split this dataset into training, validation, and test sets in an 8 : 1 : 1 ratio. |
| Hardware Specification | Yes | Our method is implemented with Pytorch on one NVIDIA RTX A100 GPU with 40GB memory. |
| Software Dependencies | No | The paper mentions 'Pytorch' but does not specify a version number or other software dependencies with versions. |
| Experiment Setup | Yes | We resize the input image size to 224 224 and with a padding ratio of 1.25 for uniformity (Baldrati et al., 2022b). The learning rate is initialized to 1e-5 and 2e-5 following a cosine schedule for the CIRR and Fashion-IQ datasets, respectively. The hyperparameters of prompt length and γ are set to 32 and 0.8, respectively. |