reproducibilityindex.ai

Semantics-Aware BERT for Language Understanding

Authors: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, Xiang Zhou9628-9635

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model is evaluated on 11 benchmark datasets involving natural language inference, question answering, semantic similarity and text classiﬁcation. Sem BERT obtains new state-of-the-art on SNLI and also obtains signiﬁcant gains on the GLUE benchmark and SQu AD 2.0. Ablation studies and analysis verify that our introduced explicit semantics is essential to the further performance improvement and Sem BERT essentially and effectively works as a uniﬁed semantics-enriched language representation model. Table 1 shows results on the GLUE benchmark datasets, showing Sem BERT gives substantial gains over BERT and outperforms all the previous state-of-the-art models in literature.
Researcher Affiliation	Collaboration	1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Mo E Key Lab of Artiﬁcial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 4College of Zhiyuan, Shanghai Jiao Tong University, China 5Cloud Walk Technology, Shanghai, China {zhangzs, will8821}@sjtu.edu.cn, zhaohai@cs.sjtu.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes the model architecture and steps in paragraph form and diagrams.
Open Source Code	Yes	1The code is publicly available at https://github.com/cooelf/Sem BERT.
Open Datasets	Yes	Our evaluation is performed on ten NLU benchmark datasets involving natural language inference, machine reading comprehension, semantic similarity and text classiﬁcation. Some of these tasks are available from the recently released GLUE benchmark (Wang et al. 2018), which is a collection of nine NLU tasks. We also extend our experiments to two widely-used tasks, SNLI (Bowman et al. 2015) and SQu AD 2.0 (Rajpurkar, Jia, and Liang 2018) to show the superiority.
Dataset Splits	Yes	Hyper-parameters were selected using the dev set.
Hardware Specification	No	The paper does not specify any particular hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies	No	Our implementation is based on the Py Torch implementation of BERT. No specific version numbers for PyTorch or BERT are provided.
Experiment Setup	Yes	We set the initial learning rate in {8e-6, 1e-5, 2e-5, 3e-5} with warm-up rate of 0.1 and L2 weight decay of 0.01. The batch size is selected in {16, 24, 32}. The maximum number of epochs is set in [2, 5] depending on tasks. Texts are tokenized using wordpieces, with maximum length of 384 for SQu AD and 200 for other tasks. The dimension of SRL embedding is set to 10. The default maximum number of predicate-argument structures m is set to 3.