reproducibilityindex.ai

Vision-by-Language for Training-Free Compositional Image Retrieval

Authors: Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first provide the experimental details in 4.1, before showcasing the results of our CIRe VL in four different ZS-CIR tasks in 4.2. Finally, we provide an in-depth analysis of our method in 4.3, highlighting it s capacity as well as the impact of the various components.
Researcher Affiliation	Academia	1T ubingen AI Center & University of T ubingen, 2University of Trento 3Helmholtz Munich, 4Technical University of Munich
Pseudocode	No	The paper describes the method using a visual diagram (Figure 1) and textual descriptions but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at github.com/Explainable ML/Vision by Language.
Open Datasets	Yes	We use the CIRR (Liu et al., 2021), CIRCO (Baldrati et al., 2023), Fashion IQ-(Wu et al., 2021) and Gene CIS (Vaze et al., 2023) datasets which have all been used for CIR.
Dataset Splits	Yes	The results on the CIRCO validation set in Table 4 illustrate that the reasoning is critical to the overall performance. We provide the results on the validation set of the Fashion-IQ benchmark in Tab. 2.
Hardware Specification	Yes	For our experiments we use Py Torch (Paszke et al., 2019), extending the public codebase of Baldrati et al. (2023), and using clusters of NVIDIA V100 and A100s.
Software Dependencies	No	The paper mentions software like PyTorch, CLIP, BLIP-2, and various LLMs (GPT-3.5-turbo, Vicuna-13B, Llama2-70B, GPT-4) but does not provide specific version numbers for these software dependencies or libraries.
Experiment Setup	Yes	Appendix A provides the specific prompt used for the LLM: 'I have an image. Given an instruction to edit the image, carefully generate a description of the edited image. I will put my image content beginning with Image Content: . The instruction I provide will begin with Instruction: . The edited description you generate should begin with Edited Description: . Each time generate one instruction and one edited description only.'