Decomposing Semantic Shifts for Composed Image Retrieval
Authors: Xingyu Yang, Daqing Liu, Heng Zhang, Yong Luo, Chaoyue Wang, Jing Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show that the proposed SSN demonstrates a significant improvement of 5.42% and 1.37% on the CIRR and Fashion IQ datasets, respectively, and establishes a new state-of-the-art performance. |
| Researcher Affiliation | Collaboration | Xingyu Yang1,2*, Daqing Liu3, Heng Zhang4, Yong Luo1,2 , Chaoyue Wang3, Jing Zhang5 1School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China, 2Hubei Luojia Laboratory, Wuhan, China, 3JD Explore Academy, JD.com, China, 4Gaoling School of Artifical Intelligence, Renmin University of China, China, 5School of Computer Science, The University of Sydney, Australia |
| Pseudocode | No | The paper describes the model architecture and processes in detail, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/starxing-yuu/SSN. |
| Open Datasets | Yes | CIRR Dataset (Liu et al. 2021) is the released dataset of open-domain for the CIR task...Fashion IQ Dataset (Wu et al. 2021) is a realistic dataset for interactive image retrieval in the fashion domain. |
| Dataset Splits | Yes | In 36,554 triplets, 80% are for training, 10% are for validation, and 10% are for evaluation. |
| Hardware Specification | Yes | All experiments can be implemented with Py Torch on a single NVIDIA RTX 3090 Ti GPU. |
| Software Dependencies | No | The paper mentions "Py Torch" but does not provide specific version numbers for software dependencies or other libraries. |
| Experiment Setup | Yes | The hidden dimension of the 1-layer 8-head transformer encoder is set to 512. The temperature λ of the main retrieval loss (in Eq.(7)) is equal to 100. Note that for Fashion IQ, we fix the image encoder after one training epoch and fine-tune the text encoder only. We adopt Adam W optimizer with an initial learning rate of 5e-5 to train the whole model. We apply the step scheduler to decay the learning rate by 10 every 10 epochs. The batch size is set to 128 and the network is trained for 50 epochs. |