reproducibilityindex.ai

A Kernel Theory of Modern Data Augmentation

Authors: Tri Dao, Albert Gu, Alexander Ratner, Virginia Smith, Chris De Sa, Christopher Re

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide several proof-of-concept applications showing that our theory can be useful for accelerating machine learning workﬂows, such as reducing the amount of computation needed to train using augmented data, and predicting the utility of a transformation prior to training. and We empirically validate the ﬁrstand second-order approximations, ˆg(w) and g(w), on MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky & Hinton, 2009) datasets, performing rotation, crop, or blur as augmentations, and using either an RBF kernel with random Fourier features (Rahimi & Recht, 2007) or Le Net (details in Appendix E.1) as a base model.
Researcher Affiliation	Academia	1Department of Computer Science, Stanford University, California, USA 2Department of Electrical and Computer Engineering, Carnegie Mellon University, Pennsylvania, USA 3Department of Computer Science, Cornell University, New York, USA.
Pseudocode	No	The paper does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce experiments and plots: https:// github.com/Hazy Research/augmentation_code
Open Datasets	Yes	on MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky & Hinton, 2009) datasets and real-world mammography tumor-classiﬁcation dataset, DDSM (Heath et al., 2000; Clark et al., 2013; Lee et al., 2016).
Dataset Splits	No	The paper mentions using MNIST and CIFAR-10 datasets for empirical validation, but it does not provide specific details on training, validation, or test splits (e.g., percentages, sample counts, or explicit splitting methodology) within the provided text.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions models like RBF kernel and Le Net, but it does not specify software dependencies (e.g., libraries, frameworks, or solvers) with version numbers that would be needed to replicate the experiments.
Experiment Setup	Yes	In particular, in Figure 1a, we plot the difference after 10 epochs of SGD training... and All models are RBF kernel classiﬁers with 10,000 random Fourier features... and We augment via rotation between 15 and 15 degrees.