Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
Authors: Yu Meng, Chenyan Xiong, Payal Bajaj, saurabh tiwary, Paul N. Bennett, Jiawei Han, Xia Song
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on the GLUE and SQu AD benchmark demonstrate the effectiveness of AMOS. |
| Researcher Affiliation | Collaboration | 1University of Illinois at Urbana-Champaign 2Microsoft 1{yumeng5,hanj}@illinois.edu 2{chenyan.xiong,payal.bajaj, satiwary,paul.n.bennett,xiaso}@microsoft.com |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Code and pretrained models can be found at https://github.com/microsoft/AMOS. |
| Open Datasets | Yes | Pretraining on Wikipedia and Book Corpus (Zhu et al., 2015) (16 GB of texts) for 256 million samples... We add in Open Web Text (Gokaslan & Cohen, 2019), CC-News (Liu et al., 2019) and STORIES (Trinh & Le, 2018), to a total of 160 GB texts... |
| Dataset Splits | Yes | All models are evaluated with the same standard fine-tuning protocols: Single task learning with vanilla fine-tuning and reporting the median of five random seeds in GLUE and SQu AD. Please refer to Appendix A for more details. ... The reported downstream task results on GLUE/SQu AD are the median of five runs with the same set of random seeds. |
| Hardware Specification | Yes | All experiments in this paper are conducted on 64 A100 GPUs each with 40GB memory size. |
| Software Dependencies | No | Our implementation builds upon the open-source implementation of fairseq Ott et al. (2019). While fairseq is mentioned as a dependency, no specific version number for it or other software components is provided. |
| Experiment Setup | Yes | Other hyperparameters used in pretraining and fine-tuning are reported in Tables 5 and 6, respectively. (Tables 5 and 6 detail parameters like Max Steps, Peak Learning Rate, Batch Size, Warm-up Steps, Sequence Length, Adam ϵ, Adam (β1, β2), Clip Norm, Dropout for both pretraining and fine-tuning). |