Do Perceptually Aligned Gradients Imply Robustness?

Authors: Roy Ganz, Bahjat Kawar, Michael Elad

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on multiple datasets and architectures validate that models with aligned gradients exhibit significant robustness, exposing the surprising bidirectional connection between PAG and robustness. Lastly, we show that better gradient alignment leads to increased robustness and harness this observation to boost the robustness of existing adversarial training techniques.
Researcher Affiliation Academia 1Electrical Engineering Department, Technion, Haifa, Israel 2Computer Science Department, Technion, Haifa, Israel.
Pseudocode No The paper describes its methods using prose and mathematical equations (e.g., Equation (4), Equation (5), Equation (1)) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https: //github.com/royg27/PAG-ROB.
Open Datasets Yes We experiment with models from different architecture families Convolutional Neural Networks and Vision Transformers (Vi T) (Dosovitskiy et al., 2021), and multiple datasets CIFAR-10, STL, and CIFAR-100.
Dataset Splits Yes To evaluate performance, we generate a balanced test set from the same distribution consisting of 600 samples. ... STL... has 5,000 training and 8,000 test images.
Hardware Specification Yes We use a single Tesla V100 GPU. ... We use two NVIDIA RTX A4000 16GB GPUs for each experiment.
Software Dependencies No The paper mentions several GitHub repositories for implementations and tools used (e.g., improved-diffusion, MLP-Mixer-CIFAR, pytorch-vgg-cifar10, auto-attack, TRADES) but does not provide specific version numbers for software dependencies like PyTorch, Python, or CUDA.
Experiment Setup Yes We do so for 100 epochs with a batch size of 128, using Adam optimizer, a learning rate of 0.01, and the same seed for both training processes. ... For all the tested datasets, we train the classifier (Res Net-18 or Vi T) for 100 epochs, using SGD with a learning rate of 0.01, a momentum of 0.9, and a weight decay of 0.0001. In addition, we use the standard augmentations for these datasets random cropping with padding of 4 and random horizontal flipping with a probability of 0.5. We use a batch size of 64 for CIFAR-10 and CIFAR-100 and 32 for STL.