Semantics-Aware BERT for Language Understanding
Authors: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, Xiang Zhou9628-9635
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model is evaluated on 11 benchmark datasets involving natural language inference, question answering, semantic similarity and text classification. Sem BERT obtains new state-of-the-art on SNLI and also obtains significant gains on the GLUE benchmark and SQu AD 2.0. Ablation studies and analysis verify that our introduced explicit semantics is essential to the further performance improvement and Sem BERT essentially and effectively works as a unified semantics-enriched language representation model. Table 1 shows results on the GLUE benchmark datasets, showing Sem BERT gives substantial gains over BERT and outperforms all the previous state-of-the-art models in literature. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 4College of Zhiyuan, Shanghai Jiao Tong University, China 5Cloud Walk Technology, Shanghai, China {zhangzs, will8821}@sjtu.edu.cn, zhaohai@cs.sjtu.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes the model architecture and steps in paragraph form and diagrams. |
| Open Source Code | Yes | 1The code is publicly available at https://github.com/cooelf/Sem BERT. |
| Open Datasets | Yes | Our evaluation is performed on ten NLU benchmark datasets involving natural language inference, machine reading comprehension, semantic similarity and text classification. Some of these tasks are available from the recently released GLUE benchmark (Wang et al. 2018), which is a collection of nine NLU tasks. We also extend our experiments to two widely-used tasks, SNLI (Bowman et al. 2015) and SQu AD 2.0 (Rajpurkar, Jia, and Liang 2018) to show the superiority. |
| Dataset Splits | Yes | Hyper-parameters were selected using the dev set. |
| Hardware Specification | No | The paper does not specify any particular hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. |
| Software Dependencies | No | Our implementation is based on the Py Torch implementation of BERT. No specific version numbers for PyTorch or BERT are provided. |
| Experiment Setup | Yes | We set the initial learning rate in {8e-6, 1e-5, 2e-5, 3e-5} with warm-up rate of 0.1 and L2 weight decay of 0.01. The batch size is selected in {16, 24, 32}. The maximum number of epochs is set in [2, 5] depending on tasks. Texts are tokenized using wordpieces, with maximum length of 384 for SQu AD and 200 for other tasks. The dimension of SRL embedding is set to 10. The default maximum number of predicate-argument structures m is set to 3. |