reproducibilityindex.ai

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Authors: Yu Meng, Chenyan Xiong, Payal Bajaj, saurabh tiwary, Paul N. Bennett, Jiawei Han, Xia Song

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on the GLUE and SQu AD benchmark demonstrate the effectiveness of AMOS.
Researcher Affiliation	Collaboration	1University of Illinois at Urbana-Champaign 2Microsoft 1{yumeng5,hanj}@illinois.edu 2{chenyan.xiong,payal.bajaj, satiwary,paul.n.bennett,xiaso}@microsoft.com
Pseudocode	No	No explicit pseudocode or algorithm blocks were found.
Open Source Code	Yes	Code and pretrained models can be found at https://github.com/microsoft/AMOS.
Open Datasets	Yes	Pretraining on Wikipedia and Book Corpus (Zhu et al., 2015) (16 GB of texts) for 256 million samples... We add in Open Web Text (Gokaslan & Cohen, 2019), CC-News (Liu et al., 2019) and STORIES (Trinh & Le, 2018), to a total of 160 GB texts...
Dataset Splits	Yes	All models are evaluated with the same standard ﬁne-tuning protocols: Single task learning with vanilla ﬁne-tuning and reporting the median of ﬁve random seeds in GLUE and SQu AD. Please refer to Appendix A for more details. ... The reported downstream task results on GLUE/SQu AD are the median of ﬁve runs with the same set of random seeds.
Hardware Specification	Yes	All experiments in this paper are conducted on 64 A100 GPUs each with 40GB memory size.
Software Dependencies	No	Our implementation builds upon the open-source implementation of fairseq Ott et al. (2019). While fairseq is mentioned as a dependency, no specific version number for it or other software components is provided.
Experiment Setup	Yes	Other hyperparameters used in pretraining and ﬁne-tuning are reported in Tables 5 and 6, respectively. (Tables 5 and 6 detail parameters like Max Steps, Peak Learning Rate, Batch Size, Warm-up Steps, Sequence Length, Adam ϵ, Adam (β1, β2), Clip Norm, Dropout for both pretraining and fine-tuning).