Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models
Authors: Peiyan Zhang, Haoyang Liu, Chaozhuo Li, Xing Xie, Sunghun Kim, Haohan Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider four different scenarios, ranging from the basic benchmark MNIST (Le Cun et al., 1998), through CIFAR10 (Krizhevsky et al., 2009), 9-class Image Net (Santurkar et al., 2019), to full-fledged 1000-class Image Net (Deng et al., 2009). |
| Researcher Affiliation | Collaboration | Peiyan Zhang1, Haoyang Liu2, Chaozhuo Li3 , Xing Xie3, Sunghun Kim1 and Haohan Wang2 1Hong Kong University of Science and Technology 2University of Illinois at Urbana-Champaign 3Microsoft Research Asia |
| Pseudocode | Yes | Algorithm 1 Perturbed Image Generation with Foundation Models |
| Open Source Code | No | The paper provides links to pretrained models and external libraries used (e.g., in Section M), but it does not explicitly state that the source code for its own proposed methodology is open-source or provide a link to it. |
| Open Datasets | Yes | We consider four different scenarios, ranging from the basic benchmark MNIST (Le Cun et al., 1998), through CIFAR10 (Krizhevsky et al., 2009), 9-class Image Net (Santurkar et al., 2019), to full-fledged 1000-class Image Net (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions the use of 'validation' in the context of 'Validation Rate (VR)' as a metric for perturbed images and the role of 'ensemble of multiple foundation models to validate the correctness of labels', but it does not provide explicit details about the train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | Our model evaluations are done on 8 NVIDIA V100 GPUs. With our Sparsified VQGAN model, our method is also feasible to work with a small amount of GPU resources. As shown in Appendix I, the proposed protocol can work on a single NVIDIA V100 GPU efficiently. |
| Software Dependencies | No | The paper mentions several software components and models like VQGAN (Esser et al., 2021), CLIP (Radford et al., 2021), timm library (Wightman, 2019), LASSO, and saga (Defazio et al., 2014) as a solver. However, it does not provide specific version numbers for these software dependencies, which would be necessary for full reproducibility. |
| Experiment Setup | Yes | The perturbation step size for each iteration is 0.001. The total number of iterations allowed (computation budget B) is 50. For Image Net, we resize all images to 224 224 px. We also center and re-scale the color values with µRGB = [0.485, 0.456, 0.406] and σ = [0.229, 0.224, 0.225]. |