reproducibilityindex.ai

Rethinking Pre-training and Self-training

Authors: Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin Dogus Cubuk, Quoc Le

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our work studies self-training with a focus on answering the above question. We deﬁne a set of control experiments where we use Image Net as additional data with the goal of improving COCO. We vary the amount of labeled data in COCO and the strength of data augmentation as control factors. Our experiments show that as we increase the strength of data augmentation or the amount of labeled data, the value of pre-training diminishes.
Researcher Affiliation	Industry	Google Research, Brain Team {barretzoph,golnazg,tsungyi,yincui,hanxiaol,cubuk,qvl}@google.com
Pseudocode	No	The paper describes methods in textual form but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and checkpoints for our models are available at https://github.com/tensorflow/tpu/tree/ master/models/official/detection/projects/self_training
Open Datasets	Yes	We use COCO dataset [58] (118k images) for supervised learning. In selftraining, we experiment with Image Net [59] (1.2M images) and Open Images [60] (1.7M images) as unlabeled datasets. We use the train set (1.5k images) of PASCAL VOC 2012 segmentation dataset [64] for supervised learning.
Dataset Splits	Yes	For all experiments using different augmentation strengths and datasets sizes, we allow each model to train until it converges (when training longer stops helping or even hurts performance on a held-out validation set). Eff-B7 models (Eff) are trained on PASCAL train set for validation results and train+val for test results.
Hardware Specification	No	The paper does not explicitly provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. It mentions 'tpu' in the provided GitHub link, but this is part of the repository path, not a statement about the hardware used for their experiments.
Software Dependencies	No	The paper mentions TensorFlow implicitly through the GitHub link, but it does not specify version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	The training batch size is 256 with weight decay 1e-4. The model is trained with learning rate 0.32 and a cosine learning rate decay schedule [62]. At the beginning of training the learning rate is linearly increased over the ﬁrst 1000 steps from 0.0032 to 0.32. For Semantic Segmentation: The learning rate is set to 0.08 for Efﬁcient Net-B7 and 0.2 for Efﬁcient Net-L2 with batch size 256 and weight decay 1e-5.