reproducibilityindex.ai

Towards Semantics-Enhanced Pre-Training: Can Lexicon Definitions Help Learning Sentence Meanings?

Authors: Xuancheng Ren, Xu Sun, Houfeng Wang, Qun Liu13736-13744

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify whether the proposed method can enhance the semantic understanding of sentences, we conduct both intrinsic evaluation that inspects knowledge learned by the pre-trained models themselves and extrinsic evaluation on semantics-oriented downstream tasks with ﬁne-tuning.
Researcher Affiliation	Collaboration	Xuancheng Ren,1 Xu Sun,1,2 Houfeng Wang,1 Qun Liu3 1 MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University 2 Center for Data Science, Peking University 3 Huawei Noah s Ark Lab
Pseudocode	No	The paper describes the methods in text and mathematical formulas but does not include pseudocode or algorithm blocks.
Open Source Code	Yes	The code and the appendix are available at https://github.com/lancopku/sempre
Open Datasets	Yes	For general-purpose pre-training, we adopt the pre-trained Ro BERTa-base and Ro BERTa-large models (Liu et al. 2019)... They are trained on a combined corpus including ﬁctions, encyclopedia, and news, totaling over 160GB text... For semantics-focused pre-training, the models are trained on word-deﬁnition pairs... we extract 0.2M word-deﬁnitions and 1.4M word-deﬁnition pairs in 23 relations from Word Net (Miller 1995).
Dataset Splits	Yes	We adopt early stopping based on validation accuracy and report the results of the bestscoring conﬁguration on the validation set. For the testing protocol, we follow Zhou et al. (2020).
Hardware Specification	No	The paper mentions using "computation resources" but does not specify any particular hardware components such as CPU or GPU models, or memory details used for the experiments.
Software Dependencies	No	The paper mentions "Our implementation is based on the fairseq (Ott et al. 2019) package" but does not provide a specific version number for fairseq or any other software dependencies.
Experiment Setup	Yes	We use a batch size of 2048 sequences, a peak learning rate of 2 × 10−5 with linear warm-up and decay peaked at the 295th update scheduled for at most 6910 updates and keep at most 128 tokens of a sequence. The batch size is 32. Each conﬁguration is run multiple times with different random start. We adopt early stopping based on validation accuracy and report the results of the best-scoring conﬁguration on the validation set. For downstream ﬁne-tuning, following Liu et al. (2019); Bisk et al. (2020), we conduct a grid search with respect to certain hyper-parameters, i.e., the learning rates [1 × 10−5, 2 × 10−5, 3 × 10−5] and the maximum epochs [10, 50].