reproducibilityindex.ai

Provable Benefit of Cutout and CutMix for Feature Learning

Authors: Junsoo Oh, Chulhee Yun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our theorems demonstrate that Cutout training can learn low-frequency features that vanilla training cannot, while Cut Mix training can learn even rarer features that Cutout cannot capture. From this, we establish that Cut Mix yields the highest test accuracy among the three. Our novel analysis reveals that Cut Mix training makes the network learn all features and noise vectors evenly regardless of the rarity and strength, which provides an interesting insight into understanding patch-level augmentation.
Researcher Affiliation	Academia	Junsoo Oh KAIST AI junsoo.oh@kaist.ac.kr Chulhee Yun KAIST AI chulhee.yun@kaist.ac.kr
Pseudocode	No	The paper describes mathematical derivations and algorithms but does not present them in a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets	Yes	We conduct experiments both in our setting and real-world data CIFAR-10 to support our theoretical findings and intuition.
Dataset Splits	No	The paper mentions 'training set' and 'test data' but does not explicitly detail validation splits. It states: 'Using a training set sampled from the distribution D, we would like to train our network f W to learn to correctly classify unseen data points from D.'
Hardware Specification	Yes	For all experiments described in this section and in Section 5, we use NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The paper mentions using 'SGD' for optimization but does not provide specific version numbers for software libraries or dependencies used in the experiments.
Experiment Setup	Yes	For the numerical experiments on our setting, we set the number of patches P = 3, dimension d = 2000, number of data points n = 300, dominant noise strength σd = 0.25, background noise strength σb = 0.15, and feature noise strength α = 0.005. ... For the learner network, we set the slope of negative regime β = 0.1 and the length of the smoothed interval r = 1. We train models using three methods: ERM, Cutout, and Cut Mix with a learning rate η = 1.