Unsupervised Data Augmentation for Consistency Training
Authors: Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, Quoc Le
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as Image Net, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used. |
| Researcher Affiliation | Collaboration | 1 Google Research, Brain Team, 2 Carnegie Mellon University |
| Pseudocode | No | The paper describes the proposed method and provides a formal objective function, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/google-research/uda. |
| Open Datasets | Yes | We evaluate UDA on a wide variety of language and vision tasks. For language, we rely on six text classification benchmark datasets, including IMDb, Yelp-2, Yelp-5, Amazon-2 and Amazon-5 sentiment classification and DBPedia topic classification [20, 21]. For vision, we employ two smaller datasets CIFAR-10 [22], SVHN [23], which are often used to compare semi-supervised algorithms, as well as Image Net [24] of a larger scale to test the scalability of UDA. |
| Dataset Splits | No | The paper refers to using varying amounts of labeled data for training (e.g., "4k and 1k labeled examples are used for CIFAR-10 and SVHN respectively") and common benchmark datasets. However, it does not explicitly provide specific percentages, counts, or explicit instructions for how the overall datasets were split into training, validation, and test sets for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries/frameworks). |
| Experiment Setup | Yes | We set λ to 1 for most of our experiments. ... We set the threshold β to a high value. Specifically, β is set to 0.8 for CIFAR-10 and SVHN and 0.5 for Image Net. ... We set to 0.4 for CIFAR-10, SVHN and Image Net. |