Representing Words as Lymphocytes

Authors: Jinfeng Yang, Yi Guan, Xishuang Dong, Bin He

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental experiments are conducted on the Penn Chinese Treebank 5.1. Experimental results indicate that the proposed word representations are effective. Experimental Results A dependency Treebank, built from the CTB, is employed as experimental data. For evaluation of the proposed word representations, words in the first 100 sentences of the CTB are considered. For each considered word, five words with most high similarities, according to equation (3), are chosen for evaluation. Two precision metrics are used to evaluate those mined similar words. The one is the precision of top one PT op1, which means the percentage of those considered words whose top one candidate word is judged similar. The second is the precision of top five PT op5, which means the percentage of those considered words for which one of the top five candidate words is judged similar. For the purpose of impartial evaluation, two persons evaluated the candidate similar words independently. Experimental results in detail can be found in the section A3 of the Appendix. As shown in table 1, the evaluation results by two persons seem to be in high agreement. The results indicate that the proposed lymphocyte-style word representation can be successfully applied for word similarity computing and is proven to be an effective word representation.
Researcher Affiliation Academia Jinfeng Yang, Yi Guan, Xishuang Dong, Bin He yangjinfeng2010@gmail.com, guanyi@hit.edu.cn,{dongxishuang, goohebingle}@gmail.com School of Computer Science and Technology of Harbin Institute of Technology Harbin, Heilongjiang, China 150001
Pseudocode No The paper provides mathematical formulations and descriptive text, but no explicit pseudocode or algorithm blocks are present.
Open Source Code Yes 1https://github.com/yangjinfeng/wordrep/blob/master/aaai2014_appendix.pdf
Open Datasets Yes Experimental results on the Penn Chinese Treebank 5.1(CTB) (Xue et al. 2005) show that lymphocyte style word representation is an effective word representation. A dependency Treebank, built from the CTB, is employed as experimental data.
Dataset Splits No The paper states 'For evaluation of the proposed word representations, words in the first 100 sentences of the CTB are considered' but does not specify train, validation, or test dataset splits with percentages, counts, or predefined split references.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies No The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or solver versions).
Experiment Setup No The paper describes the evaluation metrics and the number of candidate words chosen but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed model training configuration in the main text.