PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts
Authors: Bang An, Sicheng Zhu, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Perception CLIP achieves better generalization, group robustness, and interpretability. |
| Researcher Affiliation | Collaboration | 1University of Maryland, College Park 2Bosch Center for Artificial Intelligence |
| Pseudocode | Yes | The pseudocode of Perception CLIP is outlined in Algorithm 1. |
| Open Source Code | Yes | Our code is available at https://github.com/umd-huang-lab/perception CLIP. |
| Open Datasets | Yes | We test Perception CLIP on Image Net (Deng et al., 2009) and its out-of-distribution datasets, including Image Net V2 (Recht et al., 2019), Image Net-R (Hendrycks et al., 2021a), Image Net-A (Hendrycks et al., 2021b), and Image Net-Sketch (Wang et al., 2019). We also test on different data domains (e.g., satellite images), including CUB200 (Wah et al., 2011), Euro SAT (Helber et al., 2019), Places365 (Zhou et al., 2017), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), and Oxford Pets (Parkhi et al., 2012). |
| Dataset Splits | No | Experiments on Waterbirds and Celeb A are conducted on their training set. |
| Hardware Specification | No | No specific hardware details (e.g., GPU model, CPU type, or memory specifications) used for running experiments were provided. |
| Software Dependencies | No | The paper mentions software like CLIP and GPT-4 but does not specify version numbers for any key software components or libraries. |
| Experiment Setup | Yes | All the above experiments use Class Attr version of Perception CLIP and the intervention by setting a temperature τ in the first step (i.e., inferring contextual attributes). We found that mildly smoothing the estimation by setting τ to be 3 or 5 usually has the best performance. When we do not have a good prior of the temperature, just setting it to 1 can also have relatively good results. The reported numbers in our experiments use a temperature selected from {1,3,5,10} that performs the best on the particular dataset. |