Implanting Rational Knowledge into Distributed Representation at Morpheme Level

Authors: Zi Lin, Yang Liu2954-2961

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For evaluation, we validate the paradigmatic and syntagmatic relations of morpheme embeddings, and apply the obtained embeddings to word similarity measurement, achieving significant improvements over the classical models by more than 5 Spearman scores or 8 percentage points, which shows very promising prospects for adoption of the new source of knowledge. The experimental results of the 5 models are shown in Table 10.
Researcher Affiliation Academia Zi Lin,1,3 Yang Liu2,3 1Department of Chinese Language and Literature, Peking University 2Institute of Computational Linguistics, Peking University 3Key Laboratory of Computational Linguistics (Ministry of Education), Peking University {zi.lin, liuyang}@pku.edu.cn
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper states 'The data of morpheme embeddings and word similarity measurement is available at https://github.com/zi-lin/MC for research purpose.' This statement refers to data, not explicitly to the source code of the methodology, and thus does not meet the criteria for an unambiguous release of the methodology's source code.
Open Datasets Yes wordsim-296 (Jin and Wu 2012) and PKU-500 (Wu and Li 2016) are used as evaluation datasets.
Dataset Splits No The paper mentions training data for morpheme embeddings and test sets for evaluation ('wordsim-296' and 'PKU-500') but does not specify validation splits or other dataset partitioning details required for reproduction beyond stating the test sets.
Hardware Specification No No specific hardware specifications (like GPU/CPU models or cloud instance types) used for experiments were mentioned in the paper.
Software Dependencies No The paper mentions using 'word2vec' and 'CBOW' but does not provide specific version numbers for these or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes For morpheme embeddings on these 54,880,628 pseudo-sentences, we set the dimension to 20 and context window size to 3 to include all the rational knowledge when the MC is the target word." and "In the experiments, the dimension is set to 50, and the context window size is set to 5." and "Eventually, 9 types of word-formation pattern in the test sets (see description below) are assigned with different weights for the morphemes, as shown in Table 9.