reproducibilityindex.ai

Unsupervised Data Augmentation for Consistency Training

Authors: Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, Quoc Le

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the IMDb text classiﬁcation dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when ﬁnetuning from BERT, and yields improvements in high-data regime, such as Image Net, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used.
Researcher Affiliation	Collaboration	1 Google Research, Brain Team, 2 Carnegie Mellon University
Pseudocode	No	The paper describes the proposed method and provides a formal objective function, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/google-research/uda.
Open Datasets	Yes	We evaluate UDA on a wide variety of language and vision tasks. For language, we rely on six text classiﬁcation benchmark datasets, including IMDb, Yelp-2, Yelp-5, Amazon-2 and Amazon-5 sentiment classiﬁcation and DBPedia topic classiﬁcation [20, 21]. For vision, we employ two smaller datasets CIFAR-10 [22], SVHN [23], which are often used to compare semi-supervised algorithms, as well as Image Net [24] of a larger scale to test the scalability of UDA.
Dataset Splits	No	The paper refers to using varying amounts of labeled data for training (e.g., "4k and 1k labeled examples are used for CIFAR-10 and SVHN respectively") and common benchmark datasets. However, it does not explicitly provide specific percentages, counts, or explicit instructions for how the overall datasets were split into training, validation, and test sets for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries/frameworks).
Experiment Setup	Yes	We set λ to 1 for most of our experiments. ... We set the threshold β to a high value. Speciﬁcally, β is set to 0.8 for CIFAR-10 and SVHN and 0.5 for Image Net. ... We set to 0.4 for CIFAR-10, SVHN and Image Net.