Relative stability toward diffeomorphisms indicates performance in deep nets
Authors: Leonardo Petrini, Alessandro Favero, Mario Geiger, Matthieu Wyart
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our central result is that after training, Rf correlates very strongly with the test error ϵt: during training, Rf is reduced by several decades in current State Of The Art (SOTA) architectures on four benchmark datasets including MNIST Lecun et al. (1998), Fashion MNIST Xiao et al. (2017), CIFAR-10 Krizhevsky (2009) and Image Net Deng et al. (2009). For CIFAR10 we study 15 known architectures and find empirically that ϵt 0.2 p Rf, suggesting that obtaining a small Rf is important to achieve good performance. |
| Researcher Affiliation | Academia | Leonardo Petrini, Alessandro Favero, Mario Geiger, Matthieu Wyart Institute of Physics École Polytechnique Fédérale de Lausanne 1015 Lausanne, Switzerland {name.surname}@epfl.ch |
| Pseudocode | No | The paper describes methods and mathematical formulations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The library implementing diffeomorphisms on images is available online at github.com/pcslepfl/diffeomorphism. The code for training neural nets can be found at github.com/leonardopetrini/diffeo-sota and the corresponding pre-trained models at doi.org/10.5281/zenodo.5589870. |
| Open Datasets | Yes | Our central result is that after training, Rf correlates very strongly with the test error ϵt: during training, Rf is reduced by several decades in current State Of The Art (SOTA) architectures on four benchmark datasets including MNIST Lecun et al. (1998), Fashion MNIST Xiao et al. (2017), CIFAR-10 Krizhevsky (2009) and Image Net Deng et al. (2009). |
| Dataset Splits | No | The paper does not explicitly provide details on train/validation/test splits beyond mentioning the use of a "test set" and varying training set size P. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software components like SGD and cross-entropy loss, but it does not specify version numbers for any libraries or frameworks (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | All neural nets are trained using Stochastic Gradient Descent (SGD) with momentum 0.9 and a batch size of 128. The learning rate schedule is cosine decay with warm restarts (Loshchilov and Hutter (2016)) starting from 0.1 for CIFAR10 and F-MNIST, and 0.01 for MNIST. We train the nets for 200 epochs. |