Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models

Authors: Peiyan Zhang, Haoyang Liu, Chaozhuo Li, Xing Xie, Sunghun Kim, Haohan Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We consider four different scenarios, ranging from the basic benchmark MNIST (Le Cun et al., 1998), through CIFAR10 (Krizhevsky et al., 2009), 9-class Image Net (Santurkar et al., 2019), to full-fledged 1000-class Image Net (Deng et al., 2009).
Researcher Affiliation Collaboration Peiyan Zhang1, Haoyang Liu2, Chaozhuo Li3 , Xing Xie3, Sunghun Kim1 and Haohan Wang2 1Hong Kong University of Science and Technology 2University of Illinois at Urbana-Champaign 3Microsoft Research Asia
Pseudocode Yes Algorithm 1 Perturbed Image Generation with Foundation Models
Open Source Code No The paper provides links to pretrained models and external libraries used (e.g., in Section M), but it does not explicitly state that the source code for its own proposed methodology is open-source or provide a link to it.
Open Datasets Yes We consider four different scenarios, ranging from the basic benchmark MNIST (Le Cun et al., 1998), through CIFAR10 (Krizhevsky et al., 2009), 9-class Image Net (Santurkar et al., 2019), to full-fledged 1000-class Image Net (Deng et al., 2009).
Dataset Splits No The paper mentions the use of 'validation' in the context of 'Validation Rate (VR)' as a metric for perturbed images and the role of 'ensemble of multiple foundation models to validate the correctness of labels', but it does not provide explicit details about the train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification Yes Our model evaluations are done on 8 NVIDIA V100 GPUs. With our Sparsified VQGAN model, our method is also feasible to work with a small amount of GPU resources. As shown in Appendix I, the proposed protocol can work on a single NVIDIA V100 GPU efficiently.
Software Dependencies No The paper mentions several software components and models like VQGAN (Esser et al., 2021), CLIP (Radford et al., 2021), timm library (Wightman, 2019), LASSO, and saga (Defazio et al., 2014) as a solver. However, it does not provide specific version numbers for these software dependencies, which would be necessary for full reproducibility.
Experiment Setup Yes The perturbation step size for each iteration is 0.001. The total number of iterations allowed (computation budget B) is 50. For Image Net, we resize all images to 224 224 px. We also center and re-scale the color values with µRGB = [0.485, 0.456, 0.406] and σ = [0.229, 0.224, 0.225].