Implanting Rational Knowledge into Distributed Representation at Morpheme Level
Authors: Zi Lin, Yang Liu2954-2961
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For evaluation, we validate the paradigmatic and syntagmatic relations of morpheme embeddings, and apply the obtained embeddings to word similarity measurement, achieving significant improvements over the classical models by more than 5 Spearman scores or 8 percentage points, which shows very promising prospects for adoption of the new source of knowledge. The experimental results of the 5 models are shown in Table 10. |
| Researcher Affiliation | Academia | Zi Lin,1,3 Yang Liu2,3 1Department of Chinese Language and Literature, Peking University 2Institute of Computational Linguistics, Peking University 3Key Laboratory of Computational Linguistics (Ministry of Education), Peking University {zi.lin, liuyang}@pku.edu.cn |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper states 'The data of morpheme embeddings and word similarity measurement is available at https://github.com/zi-lin/MC for research purpose.' This statement refers to data, not explicitly to the source code of the methodology, and thus does not meet the criteria for an unambiguous release of the methodology's source code. |
| Open Datasets | Yes | wordsim-296 (Jin and Wu 2012) and PKU-500 (Wu and Li 2016) are used as evaluation datasets. |
| Dataset Splits | No | The paper mentions training data for morpheme embeddings and test sets for evaluation ('wordsim-296' and 'PKU-500') but does not specify validation splits or other dataset partitioning details required for reproduction beyond stating the test sets. |
| Hardware Specification | No | No specific hardware specifications (like GPU/CPU models or cloud instance types) used for experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions using 'word2vec' and 'CBOW' but does not provide specific version numbers for these or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | For morpheme embeddings on these 54,880,628 pseudo-sentences, we set the dimension to 20 and context window size to 3 to include all the rational knowledge when the MC is the target word." and "In the experiments, the dimension is set to 50, and the context window size is set to 5." and "Eventually, 9 types of word-formation pattern in the test sets (see description below) are assigned with different weights for the morphemes, as shown in Table 9. |