mixup: Beyond Empirical Risk Minimization
Authors: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on the Image Net-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. |
| Researcher Affiliation | Collaboration | Hongyi Zhang MIT Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz FAIR |
| Pseudocode | Yes | Figure 1a shows the few lines of code necessary to implement mixup training in Py Torch. |
| Open Source Code | Yes | The source-code necessary to replicate our CIFAR-10 experiments is available at: https://github.com/facebookresearch/mixup-cifar10. |
| Open Datasets | Yes | We evaluate mixup on the Image Net-2012 classification dataset (Russakovsky et al., 2015). This dataset contains 1.3 million training images and 50,000 validation images, from a total of 1,000 classes. |
| Dataset Splits | Yes | This dataset contains 1.3 million training images and 50,000 validation images, from a total of 1,000 classes. |
| Hardware Specification | Yes | All models are trained on a single Nvidia Tesla P100 GPU using Py Torch for 200 epochs on the training set with 128 examples per minibatch, and evaluated on the test set. |
| Software Dependencies | No | The paper mentions Caffe2 and PyTorch, but without specific version numbers. For example: "data-parallel distributed training in Caffe2" and "trained on a single Nvidia Tesla P100 GPU using Py Torch". |
| Experiment Setup | Yes | For all the experiments in this section, we use data-parallel distributed training in Caffe2 with a minibatch size of 1,024. We use the learning rate schedule described in (Goyal et al., 2017). Specifically, the learning rate is increased linearly from 0.1 to 0.4 during the first 5 epochs, and it is then divided by 10 after 30, 60 and 80 epochs when training for 90 epochs; or after 60, 120 and 180 epochs when training for 200 epochs. |