reproducibilityindex.ai

Embedding Semantic Relations into Word Representations

Authors: Danushka Bollegala, Takanori Maehara, Ken-ichi Kawarabayashi

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed method statistically significantly outperforms the current state-of-the-art word representations on three benchmark datasets for proportional analogy detection, demonstrating its ability to accurately capture the semantic relations among words.
Researcher Affiliation	Academia	Danushka Bollegala Takanori Maehara Ken-ichi Kawarabayashi University of Liverpool Shizuoka University National Institute of Informatics JST, ERATO, Kawarabayashi Large Graph Project.
Pseudocode	Yes	Algorithm 1 Learning word representations. ... The pseudo code for the proposed method is shown in Algorithm 1.
Open Source Code	No	We use the publicly available implementations2,3 by the original authors for training the word representations using the recommended parameter values. 2https://code.google.com/p/word2vec/ 3http://nlp.stanford.edu/projects/glove/. These links are for comparative methods, not the authors' proposed method.
Open Datasets	Yes	We use the uk Wa C corpus1 to extract relationally similar (positive) and dissimilar (negative) pairs of patterns (pi, pj) to train the proposed method. The uk Wa C is a 2 billion word corpus constructed from the Web limiting the crawl to the .uk domain. 1http://wacky.sslmit.unibo.it
Dataset Splits	No	The paper mentions 'The total number of training instances we select is N = 50,000 + 50,000 = 100,000.' and uses 'test' datasets, but it does not explicitly define a separate 'validation' split or its size/proportion.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions using 'publicly available implementations' for baselines, but does not provide specific version numbers for any software dependencies required to replicate their own work or the baselines.
Experiment Setup	Yes	All methods compared in Table 1 are trained on the same uk Wa C corpus of 2B tokens to produce 300 dimensional word vectors. ... In all of our experiments, the proposed method converged with less than 5 iterations.