Learning Term Embeddings for Lexical Taxonomies

Authors: Jingping Liu, Menghui Wang, Chao Wang, Jiaqing Liang, Lihan Chen, Haiyun Jiang, Yanghua Xiao, Yunwen Chen6410-6417

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on two tasks to show that our approach outperforms other embedding methods and we use the learned term embeddings to enhance the performance of the state-of-the-art models that are based on BERT and Ro BERTa on text classification.
Researcher Affiliation Collaboration 1Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China 2Data Grand Inc., Shanghai, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide explicit statements or links for the open-source code of the described methodology. Footnotes refer to BERT and RoBERTa implementations, which are third-party tools used in the experiments, not the authors' own code.
Open Datasets Yes We use Probase for evaluation, which is one of the largest LTs. To ensure the accuracy of the inputs, we filter the low-frequency relations with n(x, y) 5. Then, similar to Trans E (Bordes et al. 2013), we create two datasets constructed by selecting relations and the frequency of terms in these relations needs to be ranked in Top-5K and Top-300K in Probase. These two datasets are denoted as Pro5K and Pro300K. The statistics of these two datasets are shown in Table 1.
Dataset Splits Yes Datasets # Train # Valid or Test Pro5K 153,145 19,143 Pro300K 1,090,811 128,221
Hardware Specification Yes During alternating training, we perform lossha with 4 epochs and losss with 1 epoch in turn and our models run on Windows 10 with Intel(R) Core(TM) i7-4790K CPU, Ge Force GTX 980 and 32GB of RAM.
Software Dependencies No For the joint model, We select the learning rate λ = 0.001 for Adam. The number of negative samples n in both Eq. (3) and (8) is set to 5... Input BERT Ro BERTa Basic... and 2) encoded by Ro BERTa6BASE (Liu et al. 2019).
Experiment Setup Yes For the experiment with our model, we set α = 0.9 and ε = 0.5, respectively. For the joint model, We select the learning rate λ = 0.001 for Adam. The number of negative samples n in both Eq. (3) and (8) is set to 5. The dimension of term vector d in this paper and other compared methods is set to 128 and the vector of each term t is normalized by setting || ut||2 = 1 and || vt||2 = 1. During alternating training, we perform lossha with 4 epochs and losss with 1 epoch in turn