Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Do better ImageNet classifiers assess perceptual similarity better?
Authors: Manoj Kumar, Neil Houlsby, Nal Kalchbrenner, Ekin Dogus Cubuk
TMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present a large-scale empirical study to assess how well Image Net classifiers perform on perceptual similarity. First, we observe a inverse correlation between Image Net accuracy and Perceptual Scores of modern networks such as Res Nets, Efficient Nets, and Vision Transformers: that is better classifiers achieve worse Perceptual Scores. Then, we examine the Image Net accuracy/Perceptual Score relationship on varying the depth, width, number of training steps, weight decay, label smoothing, and dropout. |
| Researcher Affiliation | Industry | Manoj Kumar EMAIL Neil Houlsby EMAIL Nal Kalchbrenner EMAIL Ekin D. Cubuk EMAIL Google Research, Brain Team |
| Pseudocode | Yes | L Code Snippets We present code snippets for the different distance functions used in our paper. Listing 1: Code Snippets for different perceptual functions def perceptual ( tensor1 , tensor2 , eps=1e 10): """ Default perceptual distance function . |
| Open Source Code | Yes | L Code Snippets We present code snippets for the different distance functions used in our paper. Listing 1: Code Snippets for different perceptual functions def perceptual ( tensor1 , tensor2 , eps=1e 10): """ Default perceptual distance function . |
| Open Datasets | Yes | We perform a suite of experiments on BAPPS, a large dataset of human-evaluated perceptual judgements (Zhang et al., 2018). ... The BAPPS Dataset (Zhang et al., 2018) is a dataset of 161k patches derived by applying exclusively lowlevel distortions to the MIT-Adobe 5k dataset (Bychkovsky et al., 2011) for training and the RAISE1k dataset (Dang-Nguyen et al., 2015) for validation. ... Image Net (Russakovsky et al., 2015) is the cornerstone of modern supervised learning... |
| Dataset Splits | Yes | The BAPPS Dataset (Zhang et al., 2018) is a dataset of 161k patches derived by applying exclusively lowlevel distortions to the MIT-Adobe 5k dataset (Bychkovsky et al., 2011) for training and the RAISE1k dataset (Dang-Nguyen et al., 2015) for validation. ... The train set consists of the traditional and CNN-based distortions and the validation set contains all 6 families. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper provides code snippets using 'numpy' but does not specify version numbers for any software dependencies, libraries, or frameworks used for the experiments. |
| Experiment Setup | Yes | Appendix M Default Hyper-parameters We provide the default training hyper-parameters for the Res Nets, Efficient Nets and Vision Transformers in Tables 2, 3, 4 and 5. Table 2: Res Net: Default Hyperparameters (Batch Size 1024, Base Learning Rate 0.1, Train Steps 112590, Momentum 0.9, Weight Decay 0.0001, Label Smoothing 0.0, LR Schedule Step-wise Decay, Batch-Norm Momentum 0.9). Similar tables are provided for EfficientNet, ViT-B/8, and ViT-L/4. |