Leveraging Web Semantic Knowledge in Word Representation Learning
Authors: Haoyan Liu, Lei Fang, Jian-Guang Lou, Zhoujun Li6746-6753
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show that our approach outperforms the state-of-the-art methods on word similarity, word sense disambiguation, text classification and textual similarity tasks. |
| Researcher Affiliation | Collaboration | Haoyan Liu,1 Lei Fang,2 Jian-Guang Lou,2 Zhoujun Li1 1State Key Lab of Software Development Environment, Beihang University, Beijing, China 2Microsoft Research, Beijing, China |
| Pseudocode | No | The paper describes algorithmic steps and formulas but does not provide structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code and data to reproduce the results are available at https: //github.com/haoyanliu/wesek. |
| Open Datasets | Yes | To make sure that comparisons are fair, we train all embeddings on the English Wikipedia dump6. Words with a frequency below 5 are filtered out. The training data has around 1.2 billion tokens with a vocabulary size of 2.9 million. 6http://dumps.wikimedia.org/enwiki/ |
| Dataset Splits | No | The paper evaluates on various datasets for different NLP tasks but does not explicitly provide detailed train/validation/test splits (e.g., percentages, sample counts, or explicit statements about standard splits used for their evaluation) for its experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for its experiments. |
| Software Dependencies | No | The paper mentions various software packages and toolkits used (e.g., 'word2vec', 'fastText', 'GloVe', 'IMS system', 'SentEval'), but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The default size of the utilized word vectors is 300. For word2vec, we use the skip-gram model with negative sampling; set both context window size and the number of negative samples as 10, learning rate as 0.025; and run the algorithm for 3 iterations. [...] For the semantic knowledge step in WESEK, we sample 1 positive neighbor for each target word from the semantic similarity graph and draw 10 negative samples. We set λ = 0.1, and experimental results show that WESEK has robust performance when λ is less than 0.4, with λ = 0.1 achieving slightly better performance. |