Improving Robustness with Adaptive Weight Decay

Authors: Mohammad Amin Ghiasi, Ali Shafahi, Reza Ardekani

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through rigorous experimentation, we demonstrate that AWD consistently yields enhanced robustness. By conducting experiments on diverse datasets and architectures, we provide empirical evidence to showcase the effectiveness of our approach in mitigating robust overfitting.
Researcher Affiliation Industry Amin Ghiasi, Ali Shafahi, Reza Ardekani Apple Cupertino, CA, 95014 {mghiasi2, ashafahi, rardekani} @apple.com
Pseudocode Yes Algorithm 1 Adaptive Weight Decay 1: Input: λawd > 0 2: λ 0 3: for (x, y) 2 loader do 4: p model(x) . Get models prediction. 5: main Cross Entropy(p, y) . Compute Cross Entropy. 6: rw backward(main) . Compute the gradients of main loss w.r.t weights. 7: λ krwkλawd kwk . Compute iteration s weight decay hyperparameter. 8: λ 0.1 λ + 0.9 stop_gradient(λ) . Compute the weighted average as a scalar. 9: w w lr(rw + λ w) . Update Network s parameters. 10: end for
Open Source Code No The paper does not contain any explicit statement about making the source code available or provide a link to a code repository.
Open Datasets Yes We focus on six datasets: SVHN, Fashion MNIST, Flowers, CIFAR-10, CIFAR-100, and Tiny Image Net.
Dataset Splits Yes We reserve 10% of the training examples as a held-out validation set for early stopping and checkpoint selection.
Hardware Specification No The paper does not specify the hardware used for experiments, such as specific GPU or CPU models.
Software Dependencies No The paper does not provide specific software dependencies or their version numbers, such as Python or PyTorch versions.
Experiment Setup Yes We train for 200 epochs, using an initial learning-rate of 0.1 combined with a cosine learning-rate schedule.