reproducibilityindex.ai

NETWORK INSENSITIVITY TO PARAMETER NOISE VIA PARAMETER ATTACK DURING TRAINING

Authors: Julian Büchel, Fynn Firouz Faber, Dylan Richard Muir

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare against previous approaches for producing parameter insensitivity such as dropout, weight smoothing and introducing parameter noise during training. We show that our approach produces models that are more robust to random mismatch-induced parameter variation as well as to targeted parameter variation. Our approach finds minima in flatter locations in the weight-loss landscape compared with other approaches, highlighting that the networks found by our technique are less sensitive to parameter perturbation. Our work provides an approach to deploy neural network architectures to inference devices that suffer from computational non-idealities, with minimal loss of performance. This method will enable deployment at scale to novel energy-efficient computational substrates, promoting cheaper and more prevalent edge inference.
Researcher Affiliation	Collaboration	Julian Büchel IBM Research Zurich Syn Sense, Zürich, Switzerland ETH Zürich, Switzerland jbu@zurich.ibm.com Fynn Faber ETH Zürich, Switzerland faberf@ethz.ch Dylan R. Muir Syn Sense, Zürich, Switzerland dylan.muir@synsense.ai
Pseudocode	Yes	Algorithm 1 illustrates the training procedure in more detail.
Open Source Code	Yes	Code for reproduce all experiments described in this work are provided at https://github.com/jubueche/BPTT-Lipschitzness and https://github.com/jubueche/Resnet32-ICLR
Open Datasets	Yes	Speech command detection of 6 classes (Warden, 2018); ECG-anomaly detection on 4 classes (Bauer et al., 2019); Fashion-MNIST (F-MNIST): clothing-image classification on 10 classes (Xiao et al., 2017); and The Cifar10 colour image classification task (Krizhevsky, 2009).
Dataset Splits	No	All other models were trained for the same number of epochs (no early stopping) and the model with the highest validation accuracy was selected. While a validation set is mentioned, specific details about its size or percentage split from the main datasets are not provided.
Hardware Specification	No	The paper discusses target hardware like 'neuromorphic processors', 'compute-in-memory crossbar arrays of memristors', and 'Phase Change Memory (PCM)-based CiM simulator'. However, it does not specify the actual compute hardware (e.g., specific GPU models, CPUs, or cloud instances) used to perform the training and experiments.
Software Dependencies	No	The paper mentions 'Adam optimizer (Kingma & Ba, 2015)' and 'PyTorch' (in reference to the code provided on GitHub), but it does not provide specific version numbers for these or other software dependencies (e.g., Python, CUDA) that would be needed for reproducible setup.
Experiment Setup	Yes	We compared several training and attack methods, beginning with a standard Stochastic Gradient Descent (SGD) approach using the Adam optimizer (Kingma & Ba, 2015) (Standard). Learning rate varied by architecture, but was kept constant when comparing training methods on an architecture. We examined networks trained with dropout (Srivastava et al., 2014), AWP (Wu et al., 2020), AMP (Zheng et al., 2020), ABCD (Cicek & Soatto, 2019), and Entropy-SGD (Chaudhari et al., 2019). ...A dropout probability of 0.3 was used in the dropout models and γ in AWP was set to 0.1. When Gaussian noise was applied to the weights during the forward pass (Murray & Edwards, 1994) a relative standard deviation of 0.3 times the weight magnitude was used (ηtrain = 0.3). For Entropy SGD, we set the number of inner iterations to 10 with a Langevin learning rate of 0.1. Because Entropy-SGD and ABCD have inner loops, the number of total epochs were reduced accordingly. All other models were trained for the same number of epochs (no early stopping) and the model with the highest validation accuracy was selected.