Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models
Authors: Kaiyue Wen, Jiaye Teng, Jingzhao Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To understand why benign overfitting fails in the Image Net experiment, we theoretically analyze benign overfitting under a more restrictive setup... Our analysis explains our empirical observations, and is validated by a set of control experiments with Res Nets. Our analysis is supported by both synthetic and deep learning experiments. |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Shanghai Qizhi Institute 3Shanghai Artificial Intelligence Laboratory {wenky20,tjy20}@mails.tsinghua.edu.cn, jingzhaoz@mail.tsinghua.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The text 'More details can be found in the code.' is insufficient as it does not explicitly state the code is open-source or provide a link. |
| Open Datasets | Yes | We test whether Res Net (He et al., 2016) models overfit data benignly for image classification on CIFAR10 and Image Net. For Penn Tree Bank, we train a standard transformer from scratch. |
| Dataset Splits | Yes | We use Res Net50/Res Net18 to train Image Net/CIFAR10 and plot the training loss as well as the validation loss. We train each model for 200 epochs, test the validation accuracy and plot the training accuracy and validation accuracy in Figure 3. |
| Hardware Specification | No | The paper does not provide specific hardware details for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | For Image Net, we train a Res Net50 from scratch. We use standard cross entropy loss. We first train the model for 90 epochs using SGD as optimizer, with initial learning rate 1e-1, momentum 0.9 and weight decay 1e-4. The learning rate will decay by a factor of 10 for every 30 iterations. Then we train the model for another 410 epochs, with initial learning rate 1e-3,momentum 0.9 and weight decay 1e-4 and learning rate will decay by a factor of 1.25 for every 50 iterations. For CIFAR10, we randomly flip the label with probability ρ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6} and for each train a Res Net18 from scratch. We use standard cross entropy loss. We train each model for 200 epochs using SGD as optimizer, with learning rate 1e-1, momentum 0.9 and weight decay 5e-4. We use a cosine learning rate decay scheduler. For Penn Tree Bank, we train a standard transformer from scratch. We train one model for 4800 epochs using ADAM as optimizer, with learning rate 5e-4, beta (0.9, 0.98) and weight decay 1e-2. We use an inverse square learning rate scheduler.We train another model for 140 epochs using SGD as optimizer with learning rate 5.0. We use a step learning rate schedule with step size 1 and gamma 0.95. For Synthetic GMM data, we train a linear classifier for each dataset, initialized from 0. We use logistic loss. We train each model using SGD as optimizer with learning rate 1e-5 until training loss decrease below 0.05. |