Explicit Regularisation in Gaussian Noise Injections

Authors: Alexander Camuto, Matthew Willetts, Umut Simsekli, Stephen J. Roberts, Chris C. Holmes

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise, and show that it penalises functions with high-frequency components in the Fourier domain; particularly in layers closer to a neural network s output. We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins.
Researcher Affiliation Academia Alexander Camuto University of Oxford Alan Turing Institute acamuto@turing.ac.uk Matthew Willetts University of Oxford Alan Turing Institute mwilletts@turing.ac.uk Umut S ims ekli University of Oxford Institut Polytechnique de Paris umut.simsekli@telecom-paris.fr Stephen Roberts University of Oxford Alan Turing Institute sjrob@robots.ox.ac.uk Chris Holmes University of Oxford Alan Turing Institute cholmes@stats.ox.ac.uk
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes In (a,b) we plot R( ) vs E [C( )] at initialisation for 6-layer-MLPs with GNIs at each 256-neuron layer with the same variance σ2 2 [0.1, 0.25, 1.0, 4.0] at each layer. Each point corresponds to one of 250 different network initialisation acting on a batch of size 32 for the classification dataset CIFAR10 and regression dataset Boston House Prices (BHP) datasets. Figure (a) shows the test set loss for convolutional models (CONV) and 4 layer MLPs trained on SVHN with R( ) and GNIs for σ2 = 0.1, and no noise (Baseline).
Dataset Splits No The paper mentions using a 'test set' but does not explicitly provide the training, validation, and test dataset splits (e.g., percentages or exact counts) for any of the datasets used.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not specify any software dependencies with their version numbers.
Experiment Setup Yes In (a,b) we plot R( ) vs E [C( )] at initialisation for 6-layer-MLPs with GNIs at each 256-neuron layer with the same variance σ2 2 [0.1, 0.25, 1.0, 4.0] at each layer. In (c,d) we plot ratio = | E [C( )] |/R( ) in the first 100 training iteration for 10 randomly initialised networks. Shading corresponds to the standard deviation of values over the 10 networks. Figure (a) shows the test set loss for convolutional models (CONV) and 4 layer MLPs trained on SVHN with R( ) and GNIs for σ2 = 0.1, and no noise (Baseline). We use 6-layer deep 256-unit wide Re LU networks on the same dataset as in Figure 4 trained with (GNI) and without GNI (Baseline).