Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness
Authors: Suraj Srinivas, Sebastian Bordt, Himabindu Lakkaraju
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct extensive empirical analysis to confirm our theoretical analyses and additional hypotheses. |
| Researcher Affiliation | Academia | Suraj Srinivas Harvard University Cambridge, MA ssrinivas@seas.harvard.edu Sebastian Bordt University of Tübingen, Tübingen AI Center Tübingen, Germany sebastian.bordt@uni-tuebingen.de Himabindu Lakkaraju Harvard University Cambridge, MA hlakkaraju@hbs.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/tml-tuebingen/pags. |
| Open Datasets | Yes | We use CIFAR-10 [25], Image Net and Image Net-64 [26], and an MNIST dataset [27] with a distractor, inspired by [11]. |
| Dataset Splits | Yes | We use CIFAR-10 [25], Image Net and Image Net-64 [26], and an MNIST dataset [27] with a distractor, inspired by [11]. |
| Hardware Specification | No | The paper mentions training models on specific datasets but does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Resnet18, Alex Net, and diffusion models, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | On CIFAR-10, we trained Resnet18 models for 200 epochs with an initial learning rate of 0.025. When training with gradient norm regularization or the smoothness penalty and large regularization constants we reduced the learning rate proportional to the increase in the regularization constant. After 150 and 175 epochs, we decayed the learning rate by a factor of 10. On Image Net-64x64, we trained Resnet18 models for 90 epochs with a batch size of 4096 and an initial learning rate of 0.1 that was decayed after 30 and 60 epochs, respectively. We used the same parameters for projected gradient descent (PGD) as in [29], that is we took 3 steps with a step size of 2ϵ/3. On the MNIST dataset with a distractor, we trained a Resnet18 model for 9 epochs with an initial learning rate of 0.1 that was decayed after 3 and 6 epochs, respectively. We also trained an l2adversarially robust Resenet18 with projected gradient descent (PGD). We randomly chose the perturbation budget ϵ {1, 4, 8} and took 10 steps with a step size of α = 2.5ϵ/10. |