reproducibilityindex.ai

Training and Evaluating Improved Dependency-Based Word Embeddings

Authors: Chen Li, Jianxin Li, Yangqiu Song, Ziwei Lin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate and analyze our proposed approach using several direct and indirect tasks for word embeddings. Experimental results demonstrate that our embeddings are competitive to or better than state-of-the-art methods and signiﬁcantly outperform other methods in terms of context stability.
Researcher Affiliation	Academia	Chen Li, Jianxin Li, Yangqiu Song, Ziwei Lin Department of Computer Science & Engineering, Beihang University, Beijing 100191, China Department of Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Our system is publicly available at https://github. com/Ring BDStack/dependency-based-w2v.
Open Datasets	No	We trained all embeddings based on partial English Wikipdeia corpus, which contains 388,900,648 tokens and 555,434 unique words. The version of download ﬁle is wikidata-20161020. The paper does not provide a direct link or formal citation for public access to the specific processed corpus used.
Dataset Splits	No	The paper mentions using standard benchmark datasets for evaluation, but it does not explicitly provide details about training/validation/test splits for its main Wikipedia corpus used for training embeddings, nor does it specify validation splits for the downstream tasks' datasets if distinct from standard test splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models) used for running the experiments.
Software Dependencies	No	The paper mentions 'Word2Vec tool' and 'Stanford neural-network dependency parser', but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	As we found that various dimensions (50, 300, 600, 1000) of word embeddings resulted in similar trends, only experimental results for 300 dimension embeddings will be reported. Meanwhile, we set the dimension of dependency vector v(dwi,k 1,wi,k) as 50 5, the initial dependency weight ϕdwi,k 1,wi,k = 0.9, and initialize word vector v(w), positive dependency vector v(d) and other model parameters randomly. ... we dynamically adjust the context window size of target word w as follows: cw = max (sizemax log fw, sizemin)...