Over-Training with Mixup May Hurt Generalization

Authors: Zixuan Liu, Ziqiao Wang, Hongyu Guo, Yongyi Mao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are performed on a variety of benchmark datasets, validating this explanation. We conduct experiments using CIFAR10, CIFAR100 and SVHN using ERM and Mixup respectively.
Researcher Affiliation Collaboration Zixuan Liu1 , Ziqiao Wang1 , Hongyu Guo2,1,Yongyi Mao 1 1University of Ottawa 2National Research Council Canada
Pseudocode No The paper contains mathematical formulations and descriptions of methods but no structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets Yes We conduct experiments using CIFAR10, CIFAR100 and SVHN using ERM and Mixup respectively.
Dataset Splits Yes For each of the datasets, we have adopted both the original dataset and some balanced subsets obtained by downsampling the original data for certain proportions. For example, Figure 2a illustrates the results of training Res Net18 on 30% CIFAR10 data without data augmentation.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions 'SGD' and 'ResNet18' (models), but does not specify any software libraries or their version numbers required for reproduction.
Experiment Setup Yes SGD with weight decay is used. Training is performed for up to 1600 epochs for CIFAR10. The learning rate is set to 0.1, and we use full-batch gradient descent to train the student network with MSE.