Lexical Sememe Prediction via Word Embeddings and Matrix Factorization

Authors: Ruobing Xie, Xingchi Yuan, Zhiyuan Liu, Maosong Sun

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we take a real-world sememe knowledge base How Net for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction.
Researcher Affiliation Academia Ruobing Xie1 , Xingchi Yuan1 , Zhiyuan Liu1,2 , Maosong Sun1,2 1 Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Tsinghua University, China 2 Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University, China
Pseudocode No The paper describes the proposed models and their formulations mathematically, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code of this paper can be obtained from https: //github.com/thunlp/Sememe_prediction.
Open Datasets Yes We utilize the sememe annotations in How Net for sememe prediction. How Net contains 212, 539 senses with annotations belonging to 103, 843 words. The number of sememes in total is approximately 2, 000. ... We use the Sogou-T corpus 2 as the text corpus to learn Chinese word embeddings. Sogou-T is provided by a Chinese commercial search engine, which contains 2.7 billion words in total. 2https://www.sogou.com/labs/resource/t.php
Dataset Splits No The paper explicitly states a training set and a test set split: "we divide 60, 000 of the words into train set and the rest 6, 216 of them into test set." However, it does not mention a separate validation set or describe a cross-validation setup.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or cloud computing instances.
Software Dependencies No The paper does not list specific software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup Yes We empirically set the dimension of word and sememe embeddings to be 200. In SPSE, we set the probability of zero elements to be decomposed in word-sememe matrix as 0.5%, and select the initial learning rate to be 0.01, which will descend through iterations. We set the ratio λ in Equation (2) to be 0.5. In SPWE, we set the hyper-parameter p to be 0.2 and the number of most related words K = 100. In ensemble models, we have tested on different weights and choose λ1/λ2 = 2.1.