LaViP: Language-Grounded Visual Prompting

Authors: Nilakshan Kunananthaseelan, Jing Zhang, Mehrtash Harandi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We will empirically demonstrate that, compared to prior art, grounding visual prompts with language enhances both the accuracy and speed of adaptation. Moreover, our algorithm excels in base-to-novel class generalization, overcoming limitations of visual prompting and exhibiting the capacity to generalize beyond seen classes. We thoroughly assess and evaluate our method across a variety of image recognition datasets, such as Euro SAT, UCF101, DTD, and CLEVR, spanning different learning situations, including few-shot adaptation, base-to-novel class generalization, and transfer learning.
Researcher Affiliation Academia Nilakshan Kunananthaseelan1, Jing Zhang2, Mehrtash Harandi1 1Department of Electrical and Computer Systems Engineering, Monash University 2College of Engineering and Computer Science, Australian National University
Pseudocode Yes Algorithm 1 summarizes the steps involved in our method.
Open Source Code Yes https://github.com/Nilakshan Kunananthaseelan/La Vi P
Open Datasets Yes We extensively evaluate La Vi P capability on 12 benchmark datasets (refer Appendix B.32) under three distinct scenarios.
Dataset Splits Yes First, its transferability in limited data settings is assessed through few-shot learning, where it learns from 16shots for training and 4-shots for validation. Next, its generalizability is examined by evaluating its ability to learn from base classes and apply that knowledge to unseen novel classes. Finally, we use the full dataset for training, testing and validation.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup No The paper states, "More details are provided in Appendix B2," implying that experimental setup details might be there, but it does not include concrete hyperparameters or training configurations in the main text.