ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring

Authors: David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We improve the recently-proposed Mix Match semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. ... Re Mix Match, is significantly more data-efficient than prior work, requiring between 5 and 16 less data to reach the same accuracy. For example, on CIFAR10 with 250 labeled examples we reach 93.73% accuracy ... We call our improved algorithm Re Mix Match and experimentally validate it on a suite of standard SSL image benchmarks. Re Mix Match achieves state-of-the-art accuracy across all labeled data amounts...
Researcher Affiliation Industry David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel Google Research {dberth,ncarlini,cubuk,kurakin,zhanghan,craffel}@google.com Kihyuk Sohn Google Cloud AI kihyuks@google.com
Pseudocode Yes Algorithm 1 Re Mix Match algorithm for producing a collection of processed labeled examples and processed unlabeled examples with label guesses (cf. Berthelot et al. (2019) Algorithm 1.)
Open Source Code Yes We make our code and data open-source at https://github.com/google-research/remixmatch.
Open Datasets Yes For example, on CIFAR10 with 250 labeled examples we reach 93.73% accuracy... We experimentally validate it on a suite of standard SSL image benchmarks. ... CIFAR-10 Our results on CIFAR-10 are shown in table 1, left. SVHN Results for SVHN are shown in table 1, right. The STL-10 dataset consists of 5,000 labeled 96 96 color images drawn from 10 classes and 100,000 unlabeled images...
Dataset Splits Yes We follow the Realistic Semi-Supervised Learning (Oliver et al., 2018) recommendations for performing SSL evaluations. ... The STL-10 dataset consists of 5,000 labeled... The labeled set is partitioned into ten pre-defined folds of 1,000 images each. For efficiency, we only run our analysis on five of these ten folds. ... We sort the table by error rate over five different splits (i.e., 40-label subsets) of the training data.
Hardware Specification No No specific hardware details (like GPU models, CPU types, or cloud instance names) are mentioned in the paper. It only mentions using a 'Wide Res Net-28-2'.
Software Dependencies No The paper mentions using 'Adam (Kingma & Ba, 2015)' as an optimizer, but does not provide specific version numbers for any software dependencies, libraries, or programming languages.
Experiment Setup Yes Re Mix Match introduce two new hyperparameters: the weight on the rotation loss λr and the weight on the un-augmented example λ ˆU1. In practice both are fixed λr = λ ˆU1 = 0.5. Re Mix Match also shares many hyperparameters from Mix Match: the weight for the unlabeled loss λU, the sharpening temperature T, the Mix Up Beta parameter, and the number of augmentations K. All experiments (unless otherwise stated) use T = 0.5, Beta = 0.75, and λU = 1.5. We found using a larger number of augmentations monotonically increases accuracy, and so set K = 8 for all experiments (as running with K augmentations increases computation by a factor of K). We train our models using Adam (Kingma & Ba, 2015) with a fixed learning rate of 0.002 and weight decay (Zhang et al., 2018) with a fixed value of 0.02. We take the final model as an exponential moving average over the trained model weights with a decay of 0.999.