Few-Shot Composition Learning for Image Retrieval with Prompt Tuning

Authors: Junda Wu, Rui Wang, Handong Zhao, Ruiyi Zhang, Chaochao Lu, Shuai Li, Ricardo Henao

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple benchmarks show that our proposed model can yield superior performance when trained with only few query-target pairs.
Researcher Affiliation Collaboration 1New York University 2Duke University 3Adobe Research 4University of Cambridge 5Shanghai Jiao Tong University 6King Abdullah University of Science and Technology (KAUST)
Pseudocode No The paper describes the proposed method in text and with diagrams, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We evaluate our proposed model on three datasets: Fashion IQ (Guo et al. 2019), Celeb A (Liu et al. 2015) and B2W (Forbes et al. 2019).
Dataset Splits No The paper describes the N-shot sampling for training data but does not explicitly provide specific details about typical training/validation/test splits (e.g., percentages or counts for a validation set).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions "We implement all the baselines and our method with Py Torch." but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The feature dimensions for both the text encoder and the image are 512. We set the margins m in Eq. 2 to 0.02. Prompt CLIP s trainable prompts are initialized from a Gaussian distribution with zero-mean and the standard deviation of 0.02. The balancing parameter β in Eq. 8 is set to 0.5. The samples in the triplet sets are shuffled for each epoch. We use Adam (Kingma and Ba 2014) optimizer with the learning rate of 5 10 5 with a decay rate of 0.95.