Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Insights into Pre-training via Simpler Synthetic Tasks

Authors: Yuhuai Wu, Felix Li, Percy S. Liang

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains. First, building on prior work, we perform a systematic evaluation of three existing synthetic pre-training methods on six downstream tasks.
Researcher Affiliation Collaboration Yuhuai Wu12 EMAIL Felix Li3 EMAIL Percy Liang1 EMAIL 1Stanford University 2Google Research 3UC Berkeley
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We release the source code at https://github.com/felixzli/synthetic_pretraining.
Open Datasets Yes We fine-tuned synthetically pre-trained models on a diverse suite of downstream tasks: 1) Java to C# code translation (10K training examples) (Lu et al., 2021); 2) two semantic parsing benchmarks, MTOP (17K training examples) (Li et al., 2021) and Web QSP (2.7K training examples) (Yih et al., 2016)... 3) USPTO-50K retrosynthesis (40K training examples) (Liu et al., 2017)... 4) the reading comprehension benchmark SQuAD 1.1 (87K training examples) (Rajpurkar et al., 2016); and 5) the summarization benchmark CNNDM-10K3 which is 10K training examples from the CNNDM (Krishna et al., 2021).
Dataset Splits Yes For synthetic pre-training, we use the same hyperparameters that the off-the-shelf language pre-trained T5-small was trained with: Ada Factor optimizer, batch size 128, sequence length 512, and inverse square root learning rate 1/ p max(n, 10000) where n is the current training step. We evaluate token validation accuracy every 5000 training steps.
Hardware Specification No The paper mentions "Google TPU Research Cloud" for experimental support, but does not specify particular TPU versions (e.g., v2, v3, v4) or other specific hardware models (GPU/CPU).
Software Dependencies No The paper does not provide specific software dependency names with version numbers in the main text.
Experiment Setup Yes Training Details For synthetic pre-training, we use the same hyperparameters that the off-the-shelf language pre-trained T5-small was trained with: Ada Factor optimizer, batch size 128, sequence length 512, and inverse square root learning rate 1/ p max(n, 10000) where n is the current training step.