Multi-scale Feature Learning Dynamics: Insights for Double Descent

Authors: Mohammad Pezeshki, Amartya Mitra, Yoshua Bengio, Guillaume Lajoie

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this Section, we conduct numerical simulations to validate our analytical results and provide clear insights on the macroscopic dynamics of generalization. We also conduct experiments on real-world neural networks showing a close qualitative match between the generalization behavior of neural networks and our teacher-student setup.
Researcher Affiliation Academia 1Mila, Qu ebec AI Institute 2Dept. of Computer Science and Operational Research, Universit e de Montr eal 3University of California, Riverside 4Canada CIFAR AI Chair 5Dept. of Mathematics and Statistics, Universit e de Montr eal.
Pseudocode No The paper describes the analytical framework and experimental procedures in narrative text and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes To ensure reproducibility, we include the complete source code in a Git Hub repository as well as a Colab notebook.
Open Datasets Yes We conduct an experiment on the classification task of Cifar-10 (Krizhevsky et al., 2009) with varying amount of weight decay regularization strength λ.
Dataset Splits No The paper mentions training on Cifar-10 and monitoring generalization error (0-1 test error), but does not specify the explicit train/validation/test dataset splits (e.g., percentages or sample counts) used for reproducibility.
Hardware Specification No The paper does not specify the exact hardware used for experiments, such as particular GPU or CPU models. It only generally acknowledges 'Calcul Qu ebec and Compute Canada for providing us with the computing resources'.
Software Dependencies No The paper mentions using 'Adam (Kingma & Ba, 2014)' for training and 'Res Net-18 (He et al., 2016)' as the model, but does not provide specific version numbers for software libraries or dependencies like PyTorch, TensorFlow, or Python.
Experiment Setup Yes We train a Res Net-18 (He et al., 2016) with layer widths [64, 2 64, 4 64, 8 64]. We follow the training setup of Nakkiran et al. (2019); label noise with a probability 0.15 randomly assign an incorrect label to training examples. Noise is sampled only once before the training starts. We train using Adam (Kingma & Ba, 2014) with learning rate of 1e 4 for 1K epochs. Experiments are averaged over 50 random seeds. We conduct an experiment on the classification task of Cifar10 (Krizhevsky et al., 2009) with varying amount of weight decay regularization strength λ.