reproducibilityindex.ai

When and How Mixup Improves Calibration

Authors: Linjun Zhang, Zhun Deng, Kenji Kawaguchi, James Zou

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we theoretically prove that Mixup improves calibration in high-dimensional settings by investigating natural statistical models. Interestingly, the calibration beneﬁt of Mixup increases as the model capacity increases. We support our theories with experiments on common architectures and datasets.
Researcher Affiliation	Academia	1Rutgers University 2Harvard University 3National University of Singapore 4Stanford University.
Pseudocode	Yes	Algorithm 1 The pseudo-labeling algorithm
Open Source Code	No	The paper does not provide a specific link or explicit statement about the availability of open-source code for the described methodology.
Open Datasets	Yes	We used the standard data sets CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009)... We adopted the standard data sets, Kuzushiji-MNIST (Clanuwat et al., 2019), Fashion-MNIST (Xiao et al., 2017), and CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009).
Dataset Splits	No	The paper mentions data augmentation but does not explicitly state specific training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility, relying on
Hardware Specification	Yes	We run experiments with a machine with 10-Core 3.30 GHz Intel Core i9-9820X and four NVIDIA RTX 2080 Ti GPUs with 11 GB GPU memory.
Software Dependencies	No	The paper mentions software like "torchvision.transforms" and "SGD" but does not provide specific version numbers for any software dependencies or frameworks.
Experiment Setup	Yes	For the experiments on the effect of the width, we ﬁxed the depth to be 8 and varied the width from 10 to 3000. For the experiments on the effect of the depth, the depth was varied from 1 to 24 (i.e., from 3 to 26 layers including input/output layers) by ﬁxing the width to be 400 with data-augmentation and 80 without data-augmentation. We used stochastic gradient descent (SGD) with mini-batch size of 64. We set the learning rate to be 0.01 and momentum coefﬁcient to be 0.9. We used the Beta distribution Beta(α, α) with α = 1.0 for Mixup.