ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding
Authors: Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang8968-8975
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that ERNIE 2.0 model outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several similar tasks in Chinese. |
| Researcher Affiliation | Industry | Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang Baidu Inc., Beijing, China {sunyu02, wangshuohuan, tianhao, wu hua, wanghaifeng}@baidu.com |
| Pseudocode | No | The paper describes the model structure and pre-training tasks in detail, but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source codes and pre-trained models have been released at https://github.com/Paddle Paddle/ERNIE. |
| Open Datasets | Yes | For English tasks, we compare our results with BERT (Devlin et al. 2018) and XLNet (Yang et al. 2019) on GLUE. For Chinese tasks, we compare the results with that of BERT (Devlin et al. 2018) and the previous ERNIE 1.0 (Sun et al. 2019) model on several Chinese datasets. ... Discovery data (Sileo et al. 2019) ... GLUE (Wang et al. 2018) ... CMRC 2018 (Cui et al. 2018), DRCD (Shao et al. 2018), and Du Reader (He et al. 2017). ... MSRA-NER (Levow 2006). ... XNLI (Conneau et al. 2018). ... Chn Senti Corp 4 (https://github.com/pengming617/bert classification). ... LCQMC (Liu et al. 2018), and BQ Corpus (Chen et al. 2018). ... NLPCC-DBQA 5 (http://tcci.ccf.org.cn/conference/2016/dldoc/evagline2.pdf). |
| Dataset Splits | Yes | Table 5: The results on GLUE benchmark, where the results on dev set are the median of five runs and the results on test set are scored by the GLUE evaluation server (https://gluebenchmark.com/leaderboard). |
| Hardware Specification | Yes | ERNIE 2.0 is trained on 48 NVidia v100 GPU cards for the base model and 64 NVidia v100 GPU cards for the large model in both English and Chinese. |
| Software Dependencies | No | The ERNIE 2.0 framework is implemented on Paddle Paddle, which is an end-to-end open source deep learning platform developed by Baidu. No version number is provided for Paddle Paddle. |
| Experiment Setup | Yes | We use Adam optimizer that parameters of which are fixed to β1 = 0.9, β2 = 0.98, with a batch size of 393216 tokens. The learning rate is set as 5e-5 for English model and 1.28e-4 for Chinese model. It is scheduled by decay scheme noam (Vaswani et al. 2017) with warmup over the first 4,000 steps for every pre-training task. ... Detailed fine-tuning experimental settings of English tasks are shown in Table 3 while that of Chinese tasks are shown in Table 4. |