Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Authors: Yu Meng, Chenyan Xiong, Payal Bajaj, saurabh tiwary, Paul N. Bennett, Jiawei Han, Xia Song

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on the GLUE and SQu AD benchmark demonstrate the effectiveness of AMOS.
Researcher Affiliation Collaboration 1University of Illinois at Urbana-Champaign 2Microsoft 1{yumeng5,hanj}@illinois.edu 2{chenyan.xiong,payal.bajaj, satiwary,paul.n.bennett,xiaso}@microsoft.com
Pseudocode No No explicit pseudocode or algorithm blocks were found.
Open Source Code Yes Code and pretrained models can be found at https://github.com/microsoft/AMOS.
Open Datasets Yes Pretraining on Wikipedia and Book Corpus (Zhu et al., 2015) (16 GB of texts) for 256 million samples... We add in Open Web Text (Gokaslan & Cohen, 2019), CC-News (Liu et al., 2019) and STORIES (Trinh & Le, 2018), to a total of 160 GB texts...
Dataset Splits Yes All models are evaluated with the same standard fine-tuning protocols: Single task learning with vanilla fine-tuning and reporting the median of five random seeds in GLUE and SQu AD. Please refer to Appendix A for more details. ... The reported downstream task results on GLUE/SQu AD are the median of five runs with the same set of random seeds.
Hardware Specification Yes All experiments in this paper are conducted on 64 A100 GPUs each with 40GB memory size.
Software Dependencies No Our implementation builds upon the open-source implementation of fairseq Ott et al. (2019). While fairseq is mentioned as a dependency, no specific version number for it or other software components is provided.
Experiment Setup Yes Other hyperparameters used in pretraining and fine-tuning are reported in Tables 5 and 6, respectively. (Tables 5 and 6 detail parameters like Max Steps, Peak Learning Rate, Batch Size, Warm-up Steps, Sequence Length, Adam ϵ, Adam (β1, β2), Clip Norm, Dropout for both pretraining and fine-tuning).