SceneDiff: Generative Scene-Level Image Retrieval with Text and Sketch Using Diffusion Models
Authors: Ran Zuo, Haoxiang Hu, Xiaoming Deng, Cangjun Gao, Zhengming Zhang, Yu-Kun Lai, Cuixia Ma, Yong-Jin Liu, Hongan Wang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method outperforms the state-of-the-art works through extensive experiments, providing a novel insight into the related retrieval field. |
| Researcher Affiliation | Academia | Ran Zuo1,2 , Haoxiang Hu1,2 , Xiaoming Deng1,2 , Cangjun Gao1,2 , Zhengming Zhang1,2 , Yu-Kun Lai3 , Cuixia Ma1,2,4 , Yong-Jin Liu5 , Hongan Wang1,2 1Beijing Key Laboratory of Human-Computer Interaction, Institute of Software, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Cardiff University 4Key Laboratory of System Software and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences 5Tsinghua University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It provides mathematical equations and descriptive text for its methods. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository. |
| Open Datasets | Yes | (1) Sketchy COCO [Gao et al., 2020] [...] to select 1,015 pairs for training and 210 for testing. (2) FS-COCO [Chowdhury et al., 2022] [...] which includes 7,000/3,000 train/test pairs. (3) SFSD [Zhang et al., 2023b] [...] We divide the dataset into 8,480/3,635 train/test pairs. |
| Dataset Splits | Yes | (1) Sketchy COCO [Gao et al., 2020] [...] to select 1,015 pairs for training and 210 for testing. (2) FS-COCO [Chowdhury et al., 2022] [...] which includes 7,000/3,000 train/test pairs. (3) SFSD [Zhang et al., 2023b] [...] We divide the dataset into 8,480/3,635 train/test pairs. |
| Hardware Specification | Yes | All experiments are conducted on one NVIDIA A100 80G GPU with learning rate 1e-6 and batch size 4. |
| Software Dependencies | Yes | Then we construct the diffusion-based retrieval framework by utilizing the pre-trained SD model with version 1.4, along with its associated pre-trained autoencoder. |
| Experiment Setup | Yes | All experiments are conducted on one NVIDIA A100 80G GPU with learning rate 1e-6 and batch size 4. [...] The parameters are set as follows: the number of samplings n is 3, the number of sampling steps k is 2, λ1 and λ2 is 1 and 0.1 respectively. |