Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval
Authors: Yongchao Du, Min Wang, Wengang Zhou, Shuping Hui, Houqiang Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that the proposed ISA could better cope with the real retrieval scenarios and further improve retrieval accuracy and efficiency. In this section, we conduct a series of experiments to evaluate our ISA on the task of zero-shot composed image retrieval. |
| Researcher Affiliation | Academia | Yongchao Du1, Min Wang2 , Wengang Zhou1,2 , Shuping Hui1, Houqiang Li1,2 1CAS Key Laboratory of Technology in GIPAS, University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, "We re-implement their methods on BLIP with the open-source codes for fair comparison." This refers to external open-source code for other methods, not the authors' own code for their proposed method, nor does it provide a link to their own code. |
| Open Datasets | Yes | Three datasets are utilized to verify the effectiveness of our method, including CIRR Liu et al. (2021), Fashion IQ Wu et al. (2021) and CIRCO Baldrati et al. (2023a). |
| Dataset Splits | Yes | CIRR Liu et al. (2021) is a dataset of natural domain, and includes 21552 real-life images. It involves 36554 triplets, and are randomly assigned in 80% for training, 10% for validation and 10% for test. Fashion IQ Wu et al. (2021) ... The training set comprises 18000 triplets, and in total 46609 images. The validation set includes 15537 images and 6017 triplets. |
| Hardware Specification | Yes | The proposed method is implemented on open source Pytorch framework on a server with 4 NVIDIA Ge Force RTX 3090 GPU, with the batch size as 320. |
| Software Dependencies | No | The paper mentions "open source Pytorch framework" but does not specify its version number or versions of any other software dependencies. |
| Experiment Setup | Yes | For the adaptive token learner, the token length L is set as 6, and two hidden dimensions of feed-forward are set as 256 and 512, respectively. Adam W optimizer with 3e-4 learning rate is adopted, and the framework is trained for 20 epochs with 5 epochs of linear warm-up and 15 epochs of cosine annealing. The proposed method is implemented on open source Pytorch framework on a server with 4 NVIDIA Ge Force RTX 3090 GPU, with the batch size as 320. |