reproducibilityindex.ai

DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

Authors: Qianglong Chen, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang, Yin Zhang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, Commonsense QA, Open Book QA and GLUE. Experimental results demonstrate that our model can signiﬁcantly improve typical PLMs: it gains a substantial improvement of 0.5%, 2.9%, 9.0%, 7.1% and 3.3% on BERT-large respectively, and is also effective on Ro BERTa-large.
Researcher Affiliation	Collaboration	Qianglong Chen1,2 , Feng-Lin Li2 , Guohai Xu2 , Ming Yan2 , Ji Zhang2 , Yin Zhang1 1College of Computer Science and Technology, Zhejiang University, China 2Alibaba Group, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for their method is publicly available.
Open Datasets	Yes	To pre-train Dict BERT, we use the Cambridge Dictionary1, which includes 315K en-try words, as our pre-training corpus. 1https://dictionary.cambridge.org. We use Commonsense QA [Talmor et al., 2019] and Open Book QA [Mihaylov et al., 2018] to evaluate the ability of Dict BERT acting as KBs and providing implicit knowledge to downstream tasks. We follow existing knowledge enhanced PLMs such as KEPLER and Know BERT to use GLUE [Wang et al., 2018] to evaluate the general natural language understanding capability of our approach.
Dataset Splits	Yes	Table 5: Experimental results on the GLUE development set. The parameter of Dict BERT is based on BERT-large. For pre-training, we use the BERT-large-uncased and Ro BERTa-large model as backbone and set the learning rate to 1e 5, dropout rate to 0.1, max-length of tokens to 128, batch size to 32, and number of epochs to 10. For ﬁne-tuning, we adopt cross-entropy loss as the loss function, set batch size to 32 and number of epochs to 30. We run 5 times for each task and report their average.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using "BERT-large-uncased and Ro BERTa-large model as backbone" and "Adam W as the optimizer", but does not provide specific version numbers for these software components or any other libraries/frameworks used.
Experiment Setup	Yes	For pre-training, we use the BERT-large-uncased and Ro BERTa-large model as backbone and set the learning rate to 1e 5, dropout rate to 0.1, max-length of tokens to 128, batch size to 32, and number of epochs to 10. We use Adam W as the optimizer. For ﬁne-tuning, we adopt cross-entropy loss as the loss function, set batch size to 32 and number of epochs to 30. We run 5 times for each task and report their average.