reproducibilityindex.ai

On the Generalization Effects of Linear Transformations in Data Augmentation

Authors: Sen Wu, Hongyang Zhang, Gregory Valiant, Christopher Re

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	First, we validate our theoretical insights on MNIST. Based on the insights, we propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data. We validate our proposed scheme on image and text datasets. For example, our method outperforms Rand Augment by 1.24% on CIFAR100 using Wide-Res Net-28-10.
Researcher Affiliation	Academia	1Department of Computer Science, Stanford University 2Department of Statistics, The Wharton School, University of Pennsylvania. Correspondence to: all authors <{senwu, hongyang, gvaliant, chrismre}@cs.stanford.edu>
Pseudocode	Yes	Algorithm 1 Uncertainty-based sampling of transformations
Open Source Code	No	The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	Datasets and models. We consider the following datasets and models in our experiments. CIFAR-10 and CIFAR-100: The two datasets are colored images with 10 and 100 classes, respectively... Street view house numbers (SVHN): This dataset contains color house-number images with 73,257 core images for training and 26,032 digits for testing... Image Net Large-Scale Visual Recognition Challenge (Image Net): This dataset includes images of 1000 classes, and has a training set with roughly 1.3M images, and a validation set with 50,000 images.
Dataset Splits	Yes	Image Net Large-Scale Visual Recognition Challenge (Image Net): This dataset includes images of 1000 classes, and has a training set with roughly 1.3M images, and a validation set with 50,000 images.
Hardware Specification	No	Our experiments are partly run on Stanford University’s SOAL cluster hosted in the Department of Management Science and Engineering (cf. https://5harad.com/soal-cluster/). This statement points to a cluster but lacks specific hardware details such as exact GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions various models used (e.g., Wide-Res Net-28-10, Res Net-50, BERTLARGE) but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) needed to replicate the experiments.
Experiment Setup	Yes	We set L = 2, C = 4 and S = 1 for our experiments on CIFAR datasets and SVHN. We set L = 2, C = 8 and S = 4 for our experiments on Image Net. We consider K = 16 transformations in Algorithm 1, including Auto Contrast, Brightness, Color, Contrast, Cutout, Equalize, Invert, Mixup, Posterize, Rotate, Sharpness, Shear X, Shear Y, Solarize, Translate X, Translate Y.