reproducibilityindex.ai

Anticorrelated Noise Injection for Improved Generalization

Authors: Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an extensive set of experiments ranging from shallow neural networks to deep architectures with real data (e.g. CIFAR 10) and we demonstrate that Anti-PGD, indeed, reliably finds minima that are both flatter and generalize better than the ones found by standard GD or PGD.
Researcher Affiliation	Academia	1Department of Computer Science, ETH Zurich, Switzerland 2INRIA Ecole Normale Sup erieure PSL Research University, Paris, France 3Department of Mathematics, University of Oslo, Norway 4Department of Mathematics and Computer Science, University of Basel, Switzerland.
Pseudocode	No	The paper describes algorithms using mathematical equations (e.g., Eq. 1 and 2 for PGD and Anti-PGD) but does not include any structured pseudocode blocks or algorithm listings.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the availability of source code for the methodology described.
Open Datasets	Yes	We conduct an extensive set of experiments ranging from shallow neural networks to deep architectures with real data (e.g. CIFAR 10)
Dataset Splits	No	The paper mentions 'train loss' and 'test loss' but does not explicitly specify the division of data into training, validation, and test sets, nor does it provide percentages or sample counts for these splits.
Hardware Specification	No	To approximate full-batch gradient descent we use a very large batch size of 7500 samples (i.e. until saturation of 5 GPUs). While '5 GPUs' is mentioned, no specific GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware specifications (CPU, RAM) are provided.
Software Dependencies	No	The paper mentions using 'a simple SGD optimizer (with momentum 0.9)' and 'Res Net18-like architecture' but does not specify software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x) that would be needed for reproducibility.
Experiment Setup	Yes	Here, to keep things simple, we train with a simple SGD optimizer (with momentum 0.9), and select a learning rate of 0.05. To approximate full-batch gradient descent we use a very large batch size of 7500 samples (i.e. until saturation of 5 GPUs). For SGD, we instead select a batch size of 128, and keep the learning rate at 0.05. For convergence of the test accuracy and the Hessian trace, it is convenient to kill the noise injection after 250 epochs so that the optimizer converges to the nearest minimum. In this experiment, we keep the parameter settings as in the last paragraph, but instead consider injecting noise only after 75 epochs.