Relative stability toward diffeomorphisms indicates performance in deep nets

Authors: Leonardo Petrini, Alessandro Favero, Mario Geiger, Matthieu Wyart

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our central result is that after training, Rf correlates very strongly with the test error ϵt: during training, Rf is reduced by several decades in current State Of The Art (SOTA) architectures on four benchmark datasets including MNIST Lecun et al. (1998), Fashion MNIST Xiao et al. (2017), CIFAR-10 Krizhevsky (2009) and Image Net Deng et al. (2009). For CIFAR10 we study 15 known architectures and find empirically that ϵt 0.2 p Rf, suggesting that obtaining a small Rf is important to achieve good performance.
Researcher Affiliation Academia Leonardo Petrini, Alessandro Favero, Mario Geiger, Matthieu Wyart Institute of Physics École Polytechnique Fédérale de Lausanne 1015 Lausanne, Switzerland {name.surname}@epfl.ch
Pseudocode No The paper describes methods and mathematical formulations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes The library implementing diffeomorphisms on images is available online at github.com/pcslepfl/diffeomorphism. The code for training neural nets can be found at github.com/leonardopetrini/diffeo-sota and the corresponding pre-trained models at doi.org/10.5281/zenodo.5589870.
Open Datasets Yes Our central result is that after training, Rf correlates very strongly with the test error ϵt: during training, Rf is reduced by several decades in current State Of The Art (SOTA) architectures on four benchmark datasets including MNIST Lecun et al. (1998), Fashion MNIST Xiao et al. (2017), CIFAR-10 Krizhevsky (2009) and Image Net Deng et al. (2009).
Dataset Splits No The paper does not explicitly provide details on train/validation/test splits beyond mentioning the use of a "test set" and varying training set size P.
Hardware Specification No The paper does not provide specific details on the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software components like SGD and cross-entropy loss, but it does not specify version numbers for any libraries or frameworks (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes All neural nets are trained using Stochastic Gradient Descent (SGD) with momentum 0.9 and a batch size of 128. The learning rate schedule is cosine decay with warm restarts (Loshchilov and Hutter (2016)) starting from 0.1 for CIFAR10 and F-MNIST, and 0.01 for MNIST. We train the nets for 200 epochs.