Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sentence-level Prompts Benefit Composed Image Retrieval

Authors: Yang bai, Xinxing Xu, Yong Liu, Salman Khan, Fahad Khan, Wangmeng Zuo, Rick Siow Mong Goh, Chun-Mei Feng

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our proposed method performs favorably against the state-of-the-art CIR methods on the Fashion-IQ and CIRR datasets.
Researcher Affiliation	Collaboration	Yang Bai1 Xinxing Xu1 Yong Liu1 Salman Khan2,3 Fahad Khan2 Wangmeng Zuo4 Rick Siow Mong Goh1 Chun-Mei Feng1 1Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore 2Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE 3Australian National University, Canberra ACT, Australia 4Harbin Institute of Technology, Harbin, China
Pseudocode	No	The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	EMAIL https://github.com/chunmeifeng/SPRC
Open Datasets	Yes	We evaluate our method on two CIR benchmarks: (1) Fashion-IQ a fashion dataset with 77, 684 images forming 30, 134 triplets (Wu et al., 2021). ... (2) CIRR is a general image dataset that comprises 36, 554 triplets derived from 21, 552 images from the popular natural language inference dataset NLVR2 (Suhr et al., 2018).
Dataset Splits	Yes	We randomly split this dataset into training, validation, and test sets in an 8 : 1 : 1 ratio.
Hardware Specification	Yes	Our method is implemented with Pytorch on one NVIDIA RTX A100 GPU with 40GB memory.
Software Dependencies	No	The paper mentions 'Pytorch' but does not specify a version number or other software dependencies with versions.
Experiment Setup	Yes	We resize the input image size to 224 224 and with a padding ratio of 1.25 for uniformity (Baldrati et al., 2022b). The learning rate is initialized to 1e-5 and 2e-5 following a cosine schedule for the CIRR and Fashion-IQ datasets, respectively. The hyperparameters of prompt length and γ are set to 32 and 0.8, respectively.