MixUp as Locally Linear Out-of-Manifold Regularization

Authors: Hongyu Guo, Yongyi Mao, Richong Zhang3714-3722

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed regularizer, Ada Mix Up, is empirically evaluated on several benchmark datasets. Extensive experiments demonstrate that Ada Mix Up improves upon Mix Up when applied to the current art of deep classification models.
Researcher Affiliation Academia Hongyu Guo National Research Council Canada 1200 Montreal Road, Ottawa hongyu.guo@nrc-cnrc.gc.ca Yongyi Mao School of Electrical Engineering & Computer Science University of Ottawa, Ottawa, Ontario yymao@eecs.uottawa.ca Richong Zhang BDBC, School of Computer Science and Engineering Beihang University, Beijing, China zhangrc@act.buaa.edu.cn
Pseudocode No The paper describes the process and implementation details but does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes We evaluate Ada Mix Up on eight benchmarks. MNIST is the popular digit (1-10) recognition dataset... Fashion is an image recognition dataset... SVHN is the Google street view house numbers recognition data set... Cifar10 is an image classification dataset... Cifar100 is similar to CIFAR10... Image Net-R is the Image Net-2012 classification dataset (Russakovsky et al. 2014)...
Dataset Splits Yes MNIST is the popular digit (1-10) recognition dataset with 60,000 training and 10,000 test gray-level, 784-dimensional images. Cifar10 is an image classification dataset with 10 classes, 50,000 training and 10,000 test samples. Cifar100 is similar to CIFAR10 but with 100 classes and 600 images each. Cifar10-S and Cifar100-S are respectively Cifar10 and Cifar100 reduced to containing only 20% of the training samples. Image Net-R is the Image Net-2012 classification dataset (Russakovsky et al. 2014) with 1.3 million training images, 50,000 validation images, and 1,000 classes.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models) used for its experiments.
Software Dependencies No We test Ada Mix Up on two types of baseline networks: a three layers CNN as implemented in (Wu and others 2016) as the baseline for easier tasks MNIST and Fashion, and a Res Net-18 as implemented in (Zagoruyko and Komodakis 2016) for the other six more difficult tasks. All models examined are trained using mini-batched backprop, as specified in (Wu and others 2016) and (Zagoruyko and Komodakis 2016), for 400 epochs. (Wu and others 2016) cites Tensorpack and (Zagoruyko and Komodakis 2016) cites wide-residual-networks, but no specific version numbers for these or other software dependencies are provided.
Experiment Setup Yes All models examined are trained using mini-batched backprop, as specified in (Wu and others 2016) and (Zagoruyko and Komodakis 2016), for 400 epochs. Each reported performance value (accuracy or error rate) is the median of the performance values obtained in the final 10 epochs. We vary the number of filters on each layer of the Res Net-18 with half quarter, quarter, half, and three-quarter of the original number of filters (denoted as base filter)... We down sample the Cifar100 data with 20%, 40%, 60% and 80% of the training samples...