Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Leveraging Web Semantic Knowledge in Word Representation Learning
Authors: Haoyan Liu, Lei Fang, Jian-Guang Lou, Zhoujun Li6746-6753
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show that our approach outperforms the state-of-the-art methods on word similarity, word sense disambiguation, text classification and textual similarity tasks. |
| Researcher Affiliation | Collaboration | Haoyan Liu,1 Lei Fang,2 Jian-Guang Lou,2 Zhoujun Li1 1State Key Lab of Software Development Environment, Beihang University, Beijing, China 2Microsoft Research, Beijing, China |
| Pseudocode | No | The paper describes algorithmic steps and formulas but does not provide structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code and data to reproduce the results are available at https: //github.com/haoyanliu/wesek. |
| Open Datasets | Yes | To make sure that comparisons are fair, we train all embeddings on the English Wikipedia dump6. Words with a frequency below 5 are filtered out. The training data has around 1.2 billion tokens with a vocabulary size of 2.9 million. 6http://dumps.wikimedia.org/enwiki/ |
| Dataset Splits | No | The paper evaluates on various datasets for different NLP tasks but does not explicitly provide detailed train/validation/test splits (e.g., percentages, sample counts, or explicit statements about standard splits used for their evaluation) for its experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for its experiments. |
| Software Dependencies | No | The paper mentions various software packages and toolkits used (e.g., 'word2vec', 'fastText', 'GloVe', 'IMS system', 'SentEval'), but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The default size of the utilized word vectors is 300. For word2vec, we use the skip-gram model with negative sampling; set both context window size and the number of negative samples as 10, learning rate as 0.025; and run the algorithm for 3 iterations. [...] For the semantic knowledge step in WESEK, we sample 1 positive neighbor for each target word from the semantic similarity graph and draw 10 negative samples. We set λ = 0.1, and experimental results show that WESEK has robust performance when λ is less than 0.4, with λ = 0.1 achieving slightly better performance. |