PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts

Authors: Bang An, Sicheng Zhu, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that Perception CLIP achieves better generalization, group robustness, and interpretability.
Researcher Affiliation Collaboration 1University of Maryland, College Park 2Bosch Center for Artificial Intelligence
Pseudocode Yes The pseudocode of Perception CLIP is outlined in Algorithm 1.
Open Source Code Yes Our code is available at https://github.com/umd-huang-lab/perception CLIP.
Open Datasets Yes We test Perception CLIP on Image Net (Deng et al., 2009) and its out-of-distribution datasets, including Image Net V2 (Recht et al., 2019), Image Net-R (Hendrycks et al., 2021a), Image Net-A (Hendrycks et al., 2021b), and Image Net-Sketch (Wang et al., 2019). We also test on different data domains (e.g., satellite images), including CUB200 (Wah et al., 2011), Euro SAT (Helber et al., 2019), Places365 (Zhou et al., 2017), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), and Oxford Pets (Parkhi et al., 2012).
Dataset Splits No Experiments on Waterbirds and Celeb A are conducted on their training set.
Hardware Specification No No specific hardware details (e.g., GPU model, CPU type, or memory specifications) used for running experiments were provided.
Software Dependencies No The paper mentions software like CLIP and GPT-4 but does not specify version numbers for any key software components or libraries.
Experiment Setup Yes All the above experiments use Class Attr version of Perception CLIP and the intervention by setting a temperature τ in the first step (i.e., inferring contextual attributes). We found that mildly smoothing the estimation by setting τ to be 3 or 5 usually has the best performance. When we do not have a good prior of the temperature, just setting it to 1 can also have relatively good results. The reported numbers in our experiments use a temperature selected from {1,3,5,10} that performs the best on the particular dataset.