Manifold Mixup: Better Representations by Interpolating Hidden States
Authors: Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, Yoshua Bengio
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Throughout a wide variety of experiments, we demonstrate four substantial benefits of Manifold Mixup: Better generalization than other competitive regularizers (such as Cutout, Mixup, Ada Mix, and Dropout) (Section 5.1). Improved log-likelihood on test samples (Section 5.1). Increased performance at predicting data subject to novel deformations (Section 5.2). Improved robustness to single-step adversarial attacks. This is evidence Manifold Mixup pushes the decision boundary away from the data in some directions (Section 5.3). |
| Researcher Affiliation | Collaboration | 1Aalto University, Finland 2Montréal Institute for Learning Algorithms (MILA) 3Sharif University of Technology 4Facebook Research |
| Pseudocode | No | The paper describes the steps of Manifold Mixup in narrative form within Section 2, but does not include a formally labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statement about making source code available or links to a code repository. |
| Open Datasets | Yes | We show results for the CIFAR-10 (Table 1a), CIFAR-100 (Table 1b), SVHN (Table 2), and Tiny Image NET (Table 3) datasets. |
| Dataset Splits | No | For each regularizer, we selected the best hyper-parameters using a validation set. While a validation set is mentioned, no specific details about its size, split percentage, or how it was created for reproducibility are provided in the main text. |
| Hardware Specification | No | The acknowledgements section mentions 'Compute Canada for providing computing resources used in this work', but no specific hardware details such as GPU models, CPU types, or memory specifications are provided for the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are mentioned in the paper. |
| Experiment Setup | Yes | We follow the training procedure of (Zhang et al., 2018), which is to use SGD with momentum, a weight decay of 10 4, and a step-wise learning rate decay. Please refer to Appendix C for further details (including the values of the hyperparameter α). |