Towards Semantics-Enhanced Pre-Training: Can Lexicon Definitions Help Learning Sentence Meanings?
Authors: Xuancheng Ren, Xu Sun, Houfeng Wang, Qun Liu13736-13744
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify whether the proposed method can enhance the semantic understanding of sentences, we conduct both intrinsic evaluation that inspects knowledge learned by the pre-trained models themselves and extrinsic evaluation on semantics-oriented downstream tasks with fine-tuning. |
| Researcher Affiliation | Collaboration | Xuancheng Ren,1 Xu Sun,1,2 Houfeng Wang,1 Qun Liu3 1 MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University 2 Center for Data Science, Peking University 3 Huawei Noah s Ark Lab |
| Pseudocode | No | The paper describes the methods in text and mathematical formulas but does not include pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and the appendix are available at https://github.com/lancopku/sempre |
| Open Datasets | Yes | For general-purpose pre-training, we adopt the pre-trained Ro BERTa-base and Ro BERTa-large models (Liu et al. 2019)... They are trained on a combined corpus including fictions, encyclopedia, and news, totaling over 160GB text... For semantics-focused pre-training, the models are trained on word-definition pairs... we extract 0.2M word-definitions and 1.4M word-definition pairs in 23 relations from Word Net (Miller 1995). |
| Dataset Splits | Yes | We adopt early stopping based on validation accuracy and report the results of the bestscoring configuration on the validation set. For the testing protocol, we follow Zhou et al. (2020). |
| Hardware Specification | No | The paper mentions using "computation resources" but does not specify any particular hardware components such as CPU or GPU models, or memory details used for the experiments. |
| Software Dependencies | No | The paper mentions "Our implementation is based on the fairseq (Ott et al. 2019) package" but does not provide a specific version number for fairseq or any other software dependencies. |
| Experiment Setup | Yes | We use a batch size of 2048 sequences, a peak learning rate of 2 × 10−5 with linear warm-up and decay peaked at the 295th update scheduled for at most 6910 updates and keep at most 128 tokens of a sequence. The batch size is 32. Each configuration is run multiple times with different random start. We adopt early stopping based on validation accuracy and report the results of the best-scoring configuration on the validation set. For downstream fine-tuning, following Liu et al. (2019); Bisk et al. (2020), we conduct a grid search with respect to certain hyper-parameters, i.e., the learning rates [1 × 10−5, 2 × 10−5, 3 × 10−5] and the maximum epochs [10, 50]. |