mixup: Beyond Empirical Risk Minimization

Authors: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on the Image Net-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures.
Researcher Affiliation Collaboration Hongyi Zhang MIT Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz FAIR
Pseudocode Yes Figure 1a shows the few lines of code necessary to implement mixup training in Py Torch.
Open Source Code Yes The source-code necessary to replicate our CIFAR-10 experiments is available at: https://github.com/facebookresearch/mixup-cifar10.
Open Datasets Yes We evaluate mixup on the Image Net-2012 classification dataset (Russakovsky et al., 2015). This dataset contains 1.3 million training images and 50,000 validation images, from a total of 1,000 classes.
Dataset Splits Yes This dataset contains 1.3 million training images and 50,000 validation images, from a total of 1,000 classes.
Hardware Specification Yes All models are trained on a single Nvidia Tesla P100 GPU using Py Torch for 200 epochs on the training set with 128 examples per minibatch, and evaluated on the test set.
Software Dependencies No The paper mentions Caffe2 and PyTorch, but without specific version numbers. For example: "data-parallel distributed training in Caffe2" and "trained on a single Nvidia Tesla P100 GPU using Py Torch".
Experiment Setup Yes For all the experiments in this section, we use data-parallel distributed training in Caffe2 with a minibatch size of 1,024. We use the learning rate schedule described in (Goyal et al., 2017). Specifically, the learning rate is increased linearly from 0.1 to 0.4 during the first 5 epochs, and it is then divided by 10 after 30, 60 and 80 epochs when training for 90 epochs; or after 60, 120 and 180 epochs when training for 200 epochs.