Towards Understanding the Data Dependency of Mixup-style Training

Authors: Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, Rong Ge

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify that the theory predicts the experiments, we train a two-layer feedforward neural network with 512 hidden units and Re LU activations on X 2 3 with and without Mixup.
Researcher Affiliation Academia Muthu Chidambaram1, Xiang Wang1, Yuzheng Hu2, Chenwei Wu1, and Rong Ge1 1Duke University, 2University of Illinois at Urbana-Champaign
Pseudocode No The paper describes mathematical derivations and experimental procedures in prose, but it does not include any clearly labeled pseudocode blocks or algorithms.
Open Source Code Yes All of the code used to generate the plots and experimental results in this paper can be found at: https://github.com/2014mchidamb/Mixup-Data-Dependency.
Open Datasets Yes We validate this by training Res Net-18 (He et al., 2015) (using the popular implementation of Kuang Liu) on MNIST (Le Cun, 1998), CIFAR-10, and CIFAR-100 (Krizhevsky, 2009) with and without Mixup (...) we consider the two moons dataset (Buitinck et al., 2013).
Dataset Splits No We validate this by training Res Net-18 (...) on MNIST (...), CIFAR-10, and CIFAR-100 (...) for 50 epochs with a batch size of 128 (...). The paper specifies training parameters and datasets, but does not provide specific percentages or counts for training, validation, and test splits, nor does it explicitly mention a distinct validation set split with details.
Hardware Specification No The paper trains neural networks and conducts experiments but does not specify any hardware details such as specific GPU/CPU models, memory configurations, or cloud computing instance types used.
Software Dependencies No Our implementation uses Py Torch (Paszke et al., 2019) and is based heavily on the open source implementation of Manifold Mixup (Verma et al., 2019) by Shivam Saboo. (...) training using (full-batch) Adam (Kingma & Ba, 2015). The paper mentions software like PyTorch and Adam and provides citations, but it does not explicitly state specific version numbers for these software dependencies.
Experiment Setup Yes Results for training using (full-batch) Adam (Kingma & Ba, 2015) with the suggested (and common) hyperparameters of β1 = 0.9, β2 = 0.999 and a learning rate of 0.001 are shown in Figure 1. (...) We validate this by training Res Net-18 (...) for 50 epochs with a batch size of 128 and otherwise identical settings to the previous subsection.