Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Visual Classification via Description from Large Language Models

Authors: Sachit Menon, Carl Vondrick

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show our framework has numerous advantages past interpretability. We show improvements in accuracy on Image Net across distribution shifts; demonstrate the ability to adapt VLMs to recognize concepts unseen during training; and illustrate how descriptors can be edited to effectively mitigate bias compared to the baseline.
Researcher Affiliation	Academia	Sachit Menon, Carl Vondrick Department of Computer Science Columbia University
Pseudocode	No	The paper describes processes but does not include a clearly labeled pseudocode block or algorithm.
Open Source Code	No	We will also release code upon publication.
Open Datasets	Yes	We consider the Image Net dataset (Russakovsky et al., 2015) for everyday object recognition; Image Net V2 (Kornblith et al., 2019) for distribution shift from Image Net; CUB for fine-grained classification of birds (Wah et al., 2011); Euro SAT (Helber et al., 2019) for satellite image recognition; Places365 for scenes; Food101 (Bossard et al., 2014) for food; Oxford Pets (Parkhi et al., 2012) for common animals; and Describable Textures Cimpoi et al. (2014) for in-the-wild patterns.
Dataset Splits	Yes	We added two new categories to the Image Net validation set that widely appeared on the Internet after this date: a) the Ever Given, which is the ship that blocked the Suez Canal in March 2021 (Wikipedia, 2022a), and b) the game Wordle, an online word game that went viral in January 2022 (Wikipedia, 2022c). For each category, we added five images into the existing validation set of 50, 000 Image Net images.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or cloud computing resources used for the experiments.
Software Dependencies	No	The paper mentions using CLIP and GPT-3 (specifically text-davinci-002) but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	We sample from the text-davinci-002 model with temperature of 0.7 and a maximum token length of 100.