Learning sparse features can lead to overfitting in neural networks
Authors: Leonardo Petrini, Francesco Cagnetta, Eric Vanden-Eijnden, Matthieu Wyart
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that learning sparse features can lead to severe overfitting in neural networks. Our empirical results suggest that the overfitting phenomenon is caused by the model’s over-reliance on a few sparse features... For our experiments, we perform extensive experiments on CIFAR-10, CIFAR-100 and ImageNet datasets. |
| Researcher Affiliation | Academia | Hao Li, Guanxiong Liu, Sheng Li, Binghui Wang, Carnegie Mellon University. Yu-Gang Jiang, Fudan University. |
| Pseudocode | No | The paper describes methods and strategies in natural language and mathematical equations, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | For our experiments, we perform extensive experiments on CIFAR-10, CIFAR-100 and ImageNet datasets. ... For ImageNet dataset, we preprocess ImageNet following [14]. |
| Dataset Splits | Yes | We use the standard 50k/10k train/test split for CIFAR-10 and CIFAR-100. For ImageNet dataset, we preprocess ImageNet following [14]. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as exact GPU or CPU models. It only implies usage of GPUs without specifications. |
| Software Dependencies | No | The paper mentions software like PyTorch and the SGD optimizer (Appendix A), but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We use SGD optimizer with momentum 0.9 and weight decay 5e-4. For CIFAR-10 and CIFAR-100, the models are trained for 200 epochs with a batch size of 128. The learning rate is initialized to 0.1 and divided by 10 at epochs 100 and 150. For ImageNet, the models are trained for 100 epochs with a batch size of 256. The learning rate is initialized to 0.1 and divided by 10 at epochs 30, 60 and 90. |