Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Representing Words as Lymphocytes

Authors: Jinfeng Yang, Yi Guan, Xishuang Dong, Bin He

AAAI 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	experiments are conducted on the Penn Chinese Treebank 5.1. Experimental results indicate that the proposed word representations are effective. Experimental Results A dependency Treebank, built from the CTB, is employed as experimental data. For evaluation of the proposed word representations, words in the ﬁrst 100 sentences of the CTB are considered. For each considered word, ﬁve words with most high similarities, according to equation (3), are chosen for evaluation. Two precision metrics are used to evaluate those mined similar words. The one is the precision of top one PT op1, which means the percentage of those considered words whose top one candidate word is judged similar. The second is the precision of top ﬁve PT op5, which means the percentage of those considered words for which one of the top ﬁve candidate words is judged similar. For the purpose of impartial evaluation, two persons evaluated the candidate similar words independently. Experimental results in detail can be found in the section A3 of the Appendix. As shown in table 1, the evaluation results by two persons seem to be in high agreement. The results indicate that the proposed lymphocyte-style word representation can be successfully applied for word similarity computing and is proven to be an effective word representation.
Researcher Affiliation	Academia	Jinfeng Yang, Yi Guan, Xishuang Dong, Bin He EMAIL, EMAIL,EMAIL School of Computer Science and Technology of Harbin Institute of Technology Harbin, Heilongjiang, China 150001
Pseudocode	No	The paper provides mathematical formulations and descriptive text, but no explicit pseudocode or algorithm blocks are present.
Open Source Code	Yes	1https://github.com/yangjinfeng/wordrep/blob/master/aaai2014_appendix.pdf
Open Datasets	Yes	Experimental results on the Penn Chinese Treebank 5.1(CTB) (Xue et al. 2005) show that lymphocyte style word representation is an effective word representation. A dependency Treebank, built from the CTB, is employed as experimental data.
Dataset Splits	No	The paper states 'For evaluation of the proposed word representations, words in the ﬁrst 100 sentences of the CTB are considered' but does not specify train, validation, or test dataset splits with percentages, counts, or predefined split references.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or solver versions).
Experiment Setup	No	The paper describes the evaluation metrics and the number of candidate words chosen but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed model training configuration in the main text.