Manifold Mixup: Better Representations by Interpolating Hidden States

Authors: Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, Yoshua Bengio

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Throughout a wide variety of experiments, we demonstrate four substantial benefits of Manifold Mixup: Better generalization than other competitive regularizers (such as Cutout, Mixup, Ada Mix, and Dropout) (Section 5.1). Improved log-likelihood on test samples (Section 5.1). Increased performance at predicting data subject to novel deformations (Section 5.2). Improved robustness to single-step adversarial attacks. This is evidence Manifold Mixup pushes the decision boundary away from the data in some directions (Section 5.3).
Researcher Affiliation Collaboration 1Aalto University, Finland 2Montréal Institute for Learning Algorithms (MILA) 3Sharif University of Technology 4Facebook Research
Pseudocode No The paper describes the steps of Manifold Mixup in narrative form within Section 2, but does not include a formally labeled pseudocode or algorithm block.
Open Source Code No The paper does not provide any explicit statement about making source code available or links to a code repository.
Open Datasets Yes We show results for the CIFAR-10 (Table 1a), CIFAR-100 (Table 1b), SVHN (Table 2), and Tiny Image NET (Table 3) datasets.
Dataset Splits No For each regularizer, we selected the best hyper-parameters using a validation set. While a validation set is mentioned, no specific details about its size, split percentage, or how it was created for reproducibility are provided in the main text.
Hardware Specification No The acknowledgements section mentions 'Compute Canada for providing computing resources used in this work', but no specific hardware details such as GPU models, CPU types, or memory specifications are provided for the experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are mentioned in the paper.
Experiment Setup Yes We follow the training procedure of (Zhang et al., 2018), which is to use SGD with momentum, a weight decay of 10 4, and a step-wise learning rate decay. Please refer to Appendix C for further details (including the values of the hyperparameter α).