Lexical Sememe Prediction via Word Embeddings and Matrix Factorization
Authors: Ruobing Xie, Xingchi Yuan, Zhiyuan Liu, Maosong Sun
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we take a real-world sememe knowledge base How Net for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction. |
| Researcher Affiliation | Academia | Ruobing Xie1 , Xingchi Yuan1 , Zhiyuan Liu1,2 , Maosong Sun1,2 1 Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Tsinghua University, China 2 Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University, China |
| Pseudocode | No | The paper describes the proposed models and their formulations mathematically, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of this paper can be obtained from https: //github.com/thunlp/Sememe_prediction. |
| Open Datasets | Yes | We utilize the sememe annotations in How Net for sememe prediction. How Net contains 212, 539 senses with annotations belonging to 103, 843 words. The number of sememes in total is approximately 2, 000. ... We use the Sogou-T corpus 2 as the text corpus to learn Chinese word embeddings. Sogou-T is provided by a Chinese commercial search engine, which contains 2.7 billion words in total. 2https://www.sogou.com/labs/resource/t.php |
| Dataset Splits | No | The paper explicitly states a training set and a test set split: "we divide 60, 000 of the words into train set and the rest 6, 216 of them into test set." However, it does not mention a separate validation set or describe a cross-validation setup. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or cloud computing instances. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | We empirically set the dimension of word and sememe embeddings to be 200. In SPSE, we set the probability of zero elements to be decomposed in word-sememe matrix as 0.5%, and select the initial learning rate to be 0.01, which will descend through iterations. We set the ratio λ in Equation (2) to be 0.5. In SPWE, we set the hyper-parameter p to be 0.2 and the number of most related words K = 100. In ensemble models, we have tested on different weights and choose λ1/λ2 = 2.1. |