Learning Term Embeddings for Lexical Taxonomies
Authors: Jingping Liu, Menghui Wang, Chao Wang, Jiaqing Liang, Lihan Chen, Haiyun Jiang, Yanghua Xiao, Yunwen Chen6410-6417
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on two tasks to show that our approach outperforms other embedding methods and we use the learned term embeddings to enhance the performance of the state-of-the-art models that are based on BERT and Ro BERTa on text classification. |
| Researcher Affiliation | Collaboration | 1Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China 2Data Grand Inc., Shanghai, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide explicit statements or links for the open-source code of the described methodology. Footnotes refer to BERT and RoBERTa implementations, which are third-party tools used in the experiments, not the authors' own code. |
| Open Datasets | Yes | We use Probase for evaluation, which is one of the largest LTs. To ensure the accuracy of the inputs, we filter the low-frequency relations with n(x, y) 5. Then, similar to Trans E (Bordes et al. 2013), we create two datasets constructed by selecting relations and the frequency of terms in these relations needs to be ranked in Top-5K and Top-300K in Probase. These two datasets are denoted as Pro5K and Pro300K. The statistics of these two datasets are shown in Table 1. |
| Dataset Splits | Yes | Datasets # Train # Valid or Test Pro5K 153,145 19,143 Pro300K 1,090,811 128,221 |
| Hardware Specification | Yes | During alternating training, we perform lossha with 4 epochs and losss with 1 epoch in turn and our models run on Windows 10 with Intel(R) Core(TM) i7-4790K CPU, Ge Force GTX 980 and 32GB of RAM. |
| Software Dependencies | No | For the joint model, We select the learning rate λ = 0.001 for Adam. The number of negative samples n in both Eq. (3) and (8) is set to 5... Input BERT Ro BERTa Basic... and 2) encoded by Ro BERTa6BASE (Liu et al. 2019). |
| Experiment Setup | Yes | For the experiment with our model, we set α = 0.9 and ε = 0.5, respectively. For the joint model, We select the learning rate λ = 0.001 for Adam. The number of negative samples n in both Eq. (3) and (8) is set to 5. The dimension of term vector d in this paper and other compared methods is set to 128 and the vector of each term t is normalized by setting || ut||2 = 1 and || vt||2 = 1. During alternating training, we perform lossha with 4 epochs and losss with 1 epoch in turn |