Neural networks trained with SGD learn distributions of increasing complexity

Authors: Maria Refinetti, Alessandro Ingrosso, Sebastian Goldt

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then demonstrate DSB empirically in a range of deep convolutional networks and visual transformers trained on CIFAR10, and show that it even holds in networks pre-trained on Image Net.
Researcher Affiliation Academia 1Laboratoire de Physique de l Ecole Normale Supérieure, Université PSL, CNRS, Sorbonne Université, Université Paris Diderot, Sorbonne Paris Cité, Paris, France 2Ide PHICS laboratory, École Fédérale Polytechnique de Lausanne (EPFL), Switzerland 3The Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, Italy 4International School of Advanced Studies (SISSA), Trieste, Italy.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code availability Code to reproduce our experiments and to include CIFAR10 clones in your experiments can be found on Git Hub https://github.com/sgoldt/dist_inc_comp.
Open Datasets Yes Test accuracy of a Res Net18 evaluated on CIFAR10 during training with SGD on four different training data sets: the standard CIFAR10 training set (dark blue), and three different clones of the training set. and We constructed several approximations to the original CIFAR10 data set (Krizhevsky et al., 2009) for the experiments described in Section 3.
Dataset Splits No The paper mentions evaluating 'test accuracy' and training on 'CIFAR10 training set', which are standard parts of the CIFAR10 dataset. However, it does not explicitly provide the specific percentages or sample counts for the train/validation/test splits, nor does it explicitly mention a separate validation set.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU, CPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper mentions using 'pytorch (Paszke et al., 2019)' and the 'timm library (Wightman, 2019)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Unless otherwise noted, we trained all models using vanilla SGD with learning rate 0.005, cosine learning rate schedule (Loshchilov & Hutter, 2017), weight decay 5e 4, momentum 0.9, mini-batch size 128, for 200 epochs (see Appendix B.2 for details).