Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Insights into Pre-training via Simpler Synthetic Tasks
Authors: Yuhuai Wu, Felix Li, Percy S. Liang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains. First, building on prior work, we perform a systematic evaluation of three existing synthetic pre-training methods on six downstream tasks. |
| Researcher Affiliation | Collaboration | Yuhuai Wu12 EMAIL Felix Li3 EMAIL Percy Liang1 EMAIL 1Stanford University 2Google Research 3UC Berkeley |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release the source code at https://github.com/felixzli/synthetic_pretraining. |
| Open Datasets | Yes | We fine-tuned synthetically pre-trained models on a diverse suite of downstream tasks: 1) Java to C# code translation (10K training examples) (Lu et al., 2021); 2) two semantic parsing benchmarks, MTOP (17K training examples) (Li et al., 2021) and Web QSP (2.7K training examples) (Yih et al., 2016)... 3) USPTO-50K retrosynthesis (40K training examples) (Liu et al., 2017)... 4) the reading comprehension benchmark SQuAD 1.1 (87K training examples) (Rajpurkar et al., 2016); and 5) the summarization benchmark CNNDM-10K3 which is 10K training examples from the CNNDM (Krishna et al., 2021). |
| Dataset Splits | Yes | For synthetic pre-training, we use the same hyperparameters that the off-the-shelf language pre-trained T5-small was trained with: Ada Factor optimizer, batch size 128, sequence length 512, and inverse square root learning rate 1/ p max(n, 10000) where n is the current training step. We evaluate token validation accuracy every 5000 training steps. |
| Hardware Specification | No | The paper mentions "Google TPU Research Cloud" for experimental support, but does not specify particular TPU versions (e.g., v2, v3, v4) or other specific hardware models (GPU/CPU). |
| Software Dependencies | No | The paper does not provide specific software dependency names with version numbers in the main text. |
| Experiment Setup | Yes | Training Details For synthetic pre-training, we use the same hyperparameters that the off-the-shelf language pre-trained T5-small was trained with: Ada Factor optimizer, batch size 128, sequence length 512, and inverse square root learning rate 1/ p max(n, 10000) where n is the current training step. |