Few-Shot Composition Learning for Image Retrieval with Prompt Tuning
Authors: Junda Wu, Rui Wang, Handong Zhao, Ruiyi Zhang, Chaochao Lu, Shuai Li, Ricardo Henao
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple benchmarks show that our proposed model can yield superior performance when trained with only few query-target pairs. |
| Researcher Affiliation | Collaboration | 1New York University 2Duke University 3Adobe Research 4University of Cambridge 5Shanghai Jiao Tong University 6King Abdullah University of Science and Technology (KAUST) |
| Pseudocode | No | The paper describes the proposed method in text and with diagrams, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We evaluate our proposed model on three datasets: Fashion IQ (Guo et al. 2019), Celeb A (Liu et al. 2015) and B2W (Forbes et al. 2019). |
| Dataset Splits | No | The paper describes the N-shot sampling for training data but does not explicitly provide specific details about typical training/validation/test splits (e.g., percentages or counts for a validation set). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions "We implement all the baselines and our method with Py Torch." but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The feature dimensions for both the text encoder and the image are 512. We set the margins m in Eq. 2 to 0.02. Prompt CLIP s trainable prompts are initialized from a Gaussian distribution with zero-mean and the standard deviation of 0.02. The balancing parameter β in Eq. 8 is set to 0.5. The samples in the triplet sets are shuffled for each epoch. We use Adam (Kingma and Ba 2014) optimizer with the learning rate of 5 10 5 with a decay rate of 0.95. |