Multi-scale Feature Learning Dynamics: Insights for Double Descent
Authors: Mohammad Pezeshki, Amartya Mitra, Yoshua Bengio, Guillaume Lajoie
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this Section, we conduct numerical simulations to validate our analytical results and provide clear insights on the macroscopic dynamics of generalization. We also conduct experiments on real-world neural networks showing a close qualitative match between the generalization behavior of neural networks and our teacher-student setup. |
| Researcher Affiliation | Academia | 1Mila, Qu ebec AI Institute 2Dept. of Computer Science and Operational Research, Universit e de Montr eal 3University of California, Riverside 4Canada CIFAR AI Chair 5Dept. of Mathematics and Statistics, Universit e de Montr eal. |
| Pseudocode | No | The paper describes the analytical framework and experimental procedures in narrative text and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | To ensure reproducibility, we include the complete source code in a Git Hub repository as well as a Colab notebook. |
| Open Datasets | Yes | We conduct an experiment on the classification task of Cifar-10 (Krizhevsky et al., 2009) with varying amount of weight decay regularization strength λ. |
| Dataset Splits | No | The paper mentions training on Cifar-10 and monitoring generalization error (0-1 test error), but does not specify the explicit train/validation/test dataset splits (e.g., percentages or sample counts) used for reproducibility. |
| Hardware Specification | No | The paper does not specify the exact hardware used for experiments, such as particular GPU or CPU models. It only generally acknowledges 'Calcul Qu ebec and Compute Canada for providing us with the computing resources'. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014)' for training and 'Res Net-18 (He et al., 2016)' as the model, but does not provide specific version numbers for software libraries or dependencies like PyTorch, TensorFlow, or Python. |
| Experiment Setup | Yes | We train a Res Net-18 (He et al., 2016) with layer widths [64, 2 64, 4 64, 8 64]. We follow the training setup of Nakkiran et al. (2019); label noise with a probability 0.15 randomly assign an incorrect label to training examples. Noise is sampled only once before the training starts. We train using Adam (Kingma & Ba, 2014) with learning rate of 1e 4 for 1K epochs. Experiments are averaged over 50 random seeds. We conduct an experiment on the classification task of Cifar10 (Krizhevsky et al., 2009) with varying amount of weight decay regularization strength λ. |