Visual Classification via Description from Large Language Models

Authors: Sachit Menon, Carl Vondrick

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show our framework has numerous advantages past interpretability. We show improvements in accuracy on Image Net across distribution shifts; demonstrate the ability to adapt VLMs to recognize concepts unseen during training; and illustrate how descriptors can be edited to effectively mitigate bias compared to the baseline.
Researcher Affiliation Academia Sachit Menon, Carl Vondrick Department of Computer Science Columbia University
Pseudocode No The paper describes processes but does not include a clearly labeled pseudocode block or algorithm.
Open Source Code No We will also release code upon publication.
Open Datasets Yes We consider the Image Net dataset (Russakovsky et al., 2015) for everyday object recognition; Image Net V2 (Kornblith et al., 2019) for distribution shift from Image Net; CUB for fine-grained classification of birds (Wah et al., 2011); Euro SAT (Helber et al., 2019) for satellite image recognition; Places365 for scenes; Food101 (Bossard et al., 2014) for food; Oxford Pets (Parkhi et al., 2012) for common animals; and Describable Textures Cimpoi et al. (2014) for in-the-wild patterns.
Dataset Splits Yes We added two new categories to the Image Net validation set that widely appeared on the Internet after this date: a) the Ever Given, which is the ship that blocked the Suez Canal in March 2021 (Wikipedia, 2022a), and b) the game Wordle, an online word game that went viral in January 2022 (Wikipedia, 2022c). For each category, we added five images into the existing validation set of 50, 000 Image Net images.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or cloud computing resources used for the experiments.
Software Dependencies No The paper mentions using CLIP and GPT-3 (specifically text-davinci-002) but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes We sample from the text-davinci-002 model with temperature of 0.7 and a maximum token length of 100.