reproducibilityindex.ai

Think Globally, Embed Locally --- Locally Linear Meta-embedding of Words

Authors: Danushka Bollegala, Kohei Hayashi, Ken-ichi Kawarabayashi

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on semantic similarity, word analogy, relation classiﬁcation, and short-text classiﬁcation tasks show that our metaembeddings to signiﬁcantly outperform prior methods in several benchmark datasets, establishing a new state of the art for meta-embeddings.
Researcher Affiliation	Academia	University of Liverpool, Liverpool, United KingdomNational Institute of Advanced Industrial Science and Technology, Tokyo, JapanNational Institute of Informatics, Tokyo, JapanJapan Science and Technology Agency, ERATO, Kawarabayashi Large Graph Project
Pseudocode	No	The paper describes the method mathematically and textually in sections 3.2 and 3.3, but it does not contain a structured pseudocode or algorithm block.
Open Source Code	Yes	Source code for our implementation is available.1 1https://github.com/Liv NLP/LLE-Meta Embed
Open Datasets	Yes	We use ﬁve previously proposed pre-trained word embedding sets as the source embeddings in our experiments: HLBL: (hierarchical log-bilinear) [Mnih and Hinton, 2009] embeddings released by Turian et al. [2010]... Huang: Huang et al. [2012]... GloVe: Pennington et al. [2014]... CW: Collobert and Weston [2008]... CBOW: Mikolov et al. [2013b]... We use Rubenstein and Goodenough s dataset [Rubenstein and Goodenough, 1965] (RG) rare words dataset (RW) [Luong et al., 2013], Stanford s contextual word similarities (SCWS) [Huang et al., 2012], the MEN dataset [Bruni et al., 2012], and the Sim Lex dataset [Hill et al., 2015b] (SL)... Google dataset (GL) [Mikolov et al., 2013b], and in the Sem Eval (SE) dataset [Jurgens et al., 2012]... Diff Vec (DV) [Vylomova et al., 2016] dataset... Stanford sentiment treebank (TR)3 and the movie reviews dataset (MR)4.
Dataset Splits	Yes	We train a binary logistic regression classiﬁer with a cross-validated ℓ2 regulariser using the train portion of each dataset, and evaluate the classiﬁcation accuracy using the test portion of the dataset. ... Using the MC validation dataset, we set d P = 300.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, specific libraries).
Experiment Setup	Yes	Using the MC dataset, we ﬁnd the best values for the neighbourhood size n = 1200 and dimensionality d P = 300 for the Proposed method. ... The initial learning rate is set to 0.01 and the maximum number of iterations to 100 in our experiments.