Efficient local linearity regularization to overcome catastrophic overfitting
Authors: Elias Abad Rocamora, Fanghui Liu, Grigorios Chrysos, Pablo M. Olmos, Volkan Cevher
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our thorough experimental validation demonstrates that our work does not suffer from CO, even in challenging settings where previous works suffer from it. We also notice that adapting our regularization parameter during training (ELLE-A) greatly improves the performance, specially in large ϵ setups. |
| Researcher Affiliation | Academia | 1LIONS École Polytechnique Fédérale de Lausanne, 2University of Warwick, 3University of Wisconsin-Madison, 4Universidad Carlos III de Madrid |
| Pseudocode | Yes | Algorithm 1 ELLE (ELLE-A) adversarial training. Pseudo-code in teal is only run for ELLE-A. |
| Open Source Code | Yes | Our implementation is available in https://github.com/LIONS-EPFL/ELLE. |
| Open Datasets | Yes | We train the architectures Pre Act Res Net18 (PRN), Res Net50 (He et al., 2016) and Wide Res Net-28-10 (WRN) (Zagoruyko and Komodakis, 2016) in CIFAR10/100 (Krizhevsky, 2009), SVHN (Netzer et al., 2011) and Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | We evaluate the PGD-20 adversarial accuracy in a 1024-image validation sample extracted from the training set of each dataset. |
| Hardware Specification | Yes | Image Net experiments were conducted in a single machine with an NVIDIA A100 SXM4 80GB GPU. For the rest of experiments we used a single machine with an NVIDIA A100 SXM4 40 GB GPU. |
| Software Dependencies | No | The paper does not explicitly list software dependencies with specific version numbers (e.g., Python version, specific deep learning framework version like PyTorch or TensorFlow). |
| Experiment Setup | Yes | We use the SGD optimizer with momentum 0.9 and weight decay 5 × 10−4. Short: From Andriushchenko and Flammarion (2020), with 30 and 15 epochs for CIFAR10/100 and SVHN respectively, batch size of 128 and a cyclic learning rate schedule with a maximum learning rate of 0.2. Long: From Rice et al. (2020), with 200 epochs, batch size of 128, a constant learning rate of 0.1 for CIFAR10/100 and 0.01 for SVHN, decayed by a factor of 10 at epochs 100 and 150. |