Ghost Noise for Regularizing Deep Neural Networks
Authors: Atli Kosson, Dongyang Fan, Martin Jaggi
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the effectiveness of GBN by disentangling the induced Ghost Noise from normalization and quantitatively analyzing the distribution of noise as well as its impact on model performance. We experimentally show that GNI can provide a greater generalization benefit than GBN. Experimentally, we find that GNI can deliver a stronger regularization effect than GBN, resulting in improved test performance across a wide range of training settings. |
| Researcher Affiliation | Academia | EPFL, Switzerland atli.kosson@epfl.ch, dongyang.fan@epfl.ch, martin.jaggi@epfl.ch |
| Pseudocode | Yes | Figure 2: A minimal Py Torch implementation of Ghost Noise Injection for convolutional activations maps without any performance optimizations. |
| Open Source Code | No | The paper mentions using open-source libraries like PyTorch Image Models and vit-pytorch, but does not explicitly state that the code for their proposed methods (GNI, XBN) is open-source or provide a link. |
| Open Datasets | Yes | Krizhevsky, A.; and Hinton, G. 2009. CIFAR-100 (Canadian Institute For Advanced Research). Dataset available from https://www.cs.toronto.edu/ kriz/cifar.html. Deng, L. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6): 141 142. Table 2: Test accuracy for Res Net-20 CIFAR-10 (mean std% for 3 runs) and Res Net-50 Image Net-1k for different normalization and noise setups. |
| Dataset Splits | Yes | Figure 3: CIFAR-100 Res Net-18 validation accuracy versus ghost batch size and dropout probability for different methods. The optimal N = 16 gives an accuracy boost of just over 1% on both the validation and the test sets (Table 1). The ghost batch size N was tuned on a validation set and varies considerably between the settings. |
| Hardware Specification | Yes | All experiments are run on a server with NVIDIA A100 GPUs. |
| Software Dependencies | Yes | Our experiments are performed using PyTorch (Paszke et al. 2019) in Python 3.9. For the Normalization Free ResNet, Simple ViT, and Conv Mixer, we use the implementations from PyTorch Image Models (Wightman 2019) and vit-pytorch (Wang 2023). |
| Experiment Setup | Yes | For the CIFAR-100 experiments we use ResNet-18 (He et al. 2016) with SGD optimizer (momentum 0.9, weight decay 5e-4) trained for 200 epochs with a batch size of 256. The initial learning rate is 0.1, decayed by a factor of 10 at epoch 100 and 150. For ImageNet-1k, we use ResNet-50 (He et al. 2016) with the same optimizer and batch size. We train for 100 epochs, with a learning rate decay at 30, 60, 90 epochs. |