Unsupervised Data Augmentation for Consistency Training

Authors: Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, Quoc Le

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as Image Net, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used.
Researcher Affiliation Collaboration 1 Google Research, Brain Team, 2 Carnegie Mellon University
Pseudocode No The paper describes the proposed method and provides a formal objective function, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/google-research/uda.
Open Datasets Yes We evaluate UDA on a wide variety of language and vision tasks. For language, we rely on six text classification benchmark datasets, including IMDb, Yelp-2, Yelp-5, Amazon-2 and Amazon-5 sentiment classification and DBPedia topic classification [20, 21]. For vision, we employ two smaller datasets CIFAR-10 [22], SVHN [23], which are often used to compare semi-supervised algorithms, as well as Image Net [24] of a larger scale to test the scalability of UDA.
Dataset Splits No The paper refers to using varying amounts of labeled data for training (e.g., "4k and 1k labeled examples are used for CIFAR-10 and SVHN respectively") and common benchmark datasets. However, it does not explicitly provide specific percentages, counts, or explicit instructions for how the overall datasets were split into training, validation, and test sets for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries/frameworks).
Experiment Setup Yes We set λ to 1 for most of our experiments. ... We set the threshold β to a high value. Specifically, β is set to 0.8 for CIFAR-10 and SVHN and 0.5 for Image Net. ... We set to 0.4 for CIFAR-10, SVHN and Image Net.