reproducibilityindex.ai

Language Model Pre-training on True Negatives

Authors: Zhuosheng Zhang, Hai Zhao, Masao Utiyama, Eiichiro Sumita

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on GLUE and SQu AD benchmarks show that our counter-false-negative pre-training methods indeed bring about better performance together with stronger robustness.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3National Institute of Information and Communications Technology (NICT), Kyoto, Japan
Pseudocode	No	The paper describes methods with mathematical formulas and text but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets	Yes	We use the wikitext-2-raw-v1 corpus (Merity et al. 2017) for validation. We use Open Web Text (Radford et al. 2019) to train small models, and Wikipedia and Books Corpus (Zhu et al. 2015) for training base models following (Clark et al. 2020).
Dataset Splits	Yes	For evaluation, we fine-tune the pre-trained models on GLUE (General Language Understanding Evaluation) (Wang et al. 2019) and SQu AD v1.1 (Rajpurkar et al. 2016) to evaluate the performance of the pre-trained models. [...] Table 3: Comparisons between our proposed methods and the baseline pre-trained models on the dev set of GLUE tasks. and Table 4: Results on the SQu AD dev set.
Hardware Specification	Yes	Please note that it is inadequate to pursue absolute gains for large models by using single-machine NVIDIA V100 GPUs (e.g., slower convergence speed with much smaller batch sizes), compared with TPUs for training large models in public releases (Devlin et al. 2019).
Software Dependencies	No	The paper mentions using ELECTRA, BERT, Word Net, and Word2Vec but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For hyper-parameters, the batch size is 128 for the base models in our work instead of 256 as in the original setting due to limited resources. The mask ratio is 15%. We set a maximum number of tokens as 128 for small models and 512 for base models. [...] The learning rates for small and base models are 5e-4, and 5e-5, respectively.