reproducibilityindex.ai

MixMatch: A Holistic Approach to Semi-Supervised Learning

Authors: David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, Colin A. Raffel

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show that Mix Match obtains state-of-the-art results on all standard image benchmarks (section 4.2), and reducing the error rate on CIFAR-10 by a factor of 4; We further show in an ablation study that Mix Match is greater than the sum of its parts; We demonstrate in section 4.3 that Mix Match is useful for differentially private learning, enabling students in the PATE framework [36] to obtain new state-of-the-art results that simultaneously strengthen both privacy guarantees and accuracy. In short, Mix Match introduces a uniﬁed loss term for unlabeled data that seamlessly reduces entropy while maintaining consistency and remaining compatible with traditional regularization techniques. 4 Experiments We test the effectiveness of Mix Match on standard SSL benchmarks (section 4.2).
Researcher Affiliation	Industry	David Berthelot Google Research dberth@google.com Nicholas Carlini Google Research ncarlini@google.com Ian Goodfellow Work done at Google ian-academic@mailfence.com Avital Oliver Google Research avitalo@google.com Nicolas Papernot Google Research papernot@google.com Colin Raffel Google Research craffel@google.com
Pseudocode	Yes	The full Mix Match algorithm is provided in algorithm 1, and a diagram of the label guessing process is shown in ﬁg. 1. Algorithm 1 Mix Match takes a batch of labeled data X and a batch of unlabeled data U and produces a collection X (resp. U ) of processed labeled examples (resp. unlabeled with guessed labels).
Open Source Code	Yes	We release all code used in our experiments.1 1https://github.com/google-research/mixmatch
Open Datasets	Yes	We test the effectiveness of Mix Match on standard SSL benchmarks (section 4.2). First, we evaluate the effectiveness of Mix Match on four standard benchmark datasets: CIFAR-10 and CIFAR-100 [24], SVHN [32], and STL-10 [8].
Dataset Splits	Yes	Our implementation of the model and training procedure closely matches that of [35] (including using 5000 examples to select the hyperparameters), except for the following differences: First, instead of decaying the learning rate, we evaluate models using an exponential moving average of their parameters with a decay rate of 0.999.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications. It only mentions the size of the models used.
Software Dependencies	No	The paper mentions that 'Our implementation of the model and training procedure closely matches that of [35]', but it does not specify any software names with version numbers (e.g., Python, TensorFlow, PyTorch, CUDA versions) required for reproduction.
Experiment Setup	Yes	Since Mix Match combines multiple mechanisms for leveraging unlabeled data, it introduces various hyperparameters speciﬁcally, the sharpening temperature T, number of unlabeled augmentations K, α parameter for Beta in Mix Up, and the unsupervised loss weight λU. In practice, semi-supervised learning methods with many hyperparameters can be problematic because cross-validation is difﬁcult with small validation sets [35, 39, 35]. However, we ﬁnd in practice that most of Mix Match s hyperparameters can be ﬁxed and do not need to be tuned on a per-experiment or per-dataset basis. Speciﬁcally, for all experiments we set T = 0.5 and K = 2. Further, we only change α and λU on a per-dataset basis; we found that α = 0.75 and λU = 100 are good starting points for tuning. In all experiments, we linearly ramp up λU to its maximum value over the ﬁrst 16,000 steps of training as is common practice [44]. ... we apply a weight decay of 0.0004 at each update for the Wide Res Net-28 model. ... For this model, we used a weight decay of 0.0008. We used λU = 75 for CIFAR-10 and λU = 150 for CIFAR-100. ... We used λU = 250. ... For SVHN+Extra we used α = 0.25, λU = 250 and a lower weight decay of 0.000002 ... We used λU = 50.