What Makes Good Examples for Visual In-Context Learning?

Authors: Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we conduct an extensive research and identify a critical problem: downstream performance is highly sensitivie to the choice of visual in-context examples. To address this problem, we propose a prompt retrieval framework specifically for large vision models, allowing the selection of in-context examples to be fully automated. Concretely, we provide two implementations: (i) an unsupervised prompt retrieval method based on nearest example search using an off-the-shelf model, and (ii) a supervised prompt retrieval method, which trains a neural network to choose examples that directly maximize in-context learning performance. Both methods do not require access to the internal weights of large vision models. Our results demonstrate that our methods can bring non-trivial improvements to visual in-context learning in comparison to the commonly-used random selection.
Researcher Affiliation Academia Yuanhan Zhang1 Kaiyang Zhou2 Ziwei Liu1, B 1S-Lab, Nanyang Technological University, Singapore 2Hong Kong Baptist University, Hong Kong {yuanhan002, ziwei.liu}@ntu.edu.sg kyzhou@hkbu.edu.hk
Pseudocode No The paper describes its methods in text and diagrams (e.g., Figure 2) but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes The code and models are available at https://github.com/Zhang Yuanhan-AI/visual_ prompt_retrieval.
Open Datasets Yes Foreground segmentation: We use Pascal-5i [21]... Single object detection: The experiments are done on Pascal VOC [7]... Colorization: We use Image Net-2012 [19]...
Dataset Splits No The paper mentions using a 'training dataset' and a 'test set' for evaluation, but does not explicitly describe a separate validation split used during the training of their prompt retrieval methods.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running its experiments.
Software Dependencies No The paper mentions models like CLIP, MAE, and ViT, and uses SGD, but it does not specify version numbers for any software dependencies or libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes Training details for the supervised model. The supervised model is trained for 200 epochs using SGD. The initial learning rate is set to 0.005, decayed by the cosine annealing rule. Each task need a specific supervised model for Sup PR.