Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
Authors: Yu Meng, Chenyan Xiong, Payal Bajaj, saurabh tiwary, Paul N. Bennett, Jiawei Han, Xia Song
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on the GLUE and SQu AD benchmark demonstrate the effectiveness of AMOS. |
| Researcher Affiliation | Collaboration | 1University of Illinois at Urbana-Champaign 2Microsoft 1EMAIL 2EMAIL |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Code and pretrained models can be found at https://github.com/microsoft/AMOS. |
| Open Datasets | Yes | Pretraining on Wikipedia and Book Corpus (Zhu et al., 2015) (16 GB of texts) for 256 million samples... We add in Open Web Text (Gokaslan & Cohen, 2019), CC-News (Liu et al., 2019) and STORIES (Trinh & Le, 2018), to a total of 160 GB texts... |
| Dataset Splits | Yes | All models are evaluated with the same standard fine-tuning protocols: Single task learning with vanilla fine-tuning and reporting the median of five random seeds in GLUE and SQu AD. Please refer to Appendix A for more details. ... The reported downstream task results on GLUE/SQu AD are the median of five runs with the same set of random seeds. |
| Hardware Specification | Yes | All experiments in this paper are conducted on 64 A100 GPUs each with 40GB memory size. |
| Software Dependencies | No | Our implementation builds upon the open-source implementation of fairseq Ott et al. (2019). While fairseq is mentioned as a dependency, no specific version number for it or other software components is provided. |
| Experiment Setup | Yes | Other hyperparameters used in pretraining and fine-tuning are reported in Tables 5 and 6, respectively. (Tables 5 and 6 detail parameters like Max Steps, Peak Learning Rate, Batch Size, Warm-up Steps, Sequence Length, Adam ϵ, Adam (β1, β2), Clip Norm, Dropout for both pretraining and fine-tuning). |