Think Globally, Embed Locally --- Locally Linear Meta-embedding of Words
Authors: Danushka Bollegala, Kohei Hayashi, Ken-ichi Kawarabayashi
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on semantic similarity, word analogy, relation classification, and short-text classification tasks show that our metaembeddings to significantly outperform prior methods in several benchmark datasets, establishing a new state of the art for meta-embeddings. |
| Researcher Affiliation | Academia | University of Liverpool, Liverpool, United KingdomNational Institute of Advanced Industrial Science and Technology, Tokyo, JapanNational Institute of Informatics, Tokyo, JapanJapan Science and Technology Agency, ERATO, Kawarabayashi Large Graph Project |
| Pseudocode | No | The paper describes the method mathematically and textually in sections 3.2 and 3.3, but it does not contain a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Source code for our implementation is available.1 1https://github.com/Liv NLP/LLE-Meta Embed |
| Open Datasets | Yes | We use five previously proposed pre-trained word embedding sets as the source embeddings in our experiments: HLBL: (hierarchical log-bilinear) [Mnih and Hinton, 2009] embeddings released by Turian et al. [2010]... Huang: Huang et al. [2012]... GloVe: Pennington et al. [2014]... CW: Collobert and Weston [2008]... CBOW: Mikolov et al. [2013b]... We use Rubenstein and Goodenough s dataset [Rubenstein and Goodenough, 1965] (RG) rare words dataset (RW) [Luong et al., 2013], Stanford s contextual word similarities (SCWS) [Huang et al., 2012], the MEN dataset [Bruni et al., 2012], and the Sim Lex dataset [Hill et al., 2015b] (SL)... Google dataset (GL) [Mikolov et al., 2013b], and in the Sem Eval (SE) dataset [Jurgens et al., 2012]... Diff Vec (DV) [Vylomova et al., 2016] dataset... Stanford sentiment treebank (TR)3 and the movie reviews dataset (MR)4. |
| Dataset Splits | Yes | We train a binary logistic regression classifier with a cross-validated ℓ2 regulariser using the train portion of each dataset, and evaluate the classification accuracy using the test portion of the dataset. ... Using the MC validation dataset, we set d P = 300. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, specific libraries). |
| Experiment Setup | Yes | Using the MC dataset, we find the best values for the neighbourhood size n = 1200 and dimensionality d P = 300 for the Proposed method. ... The initial learning rate is set to 0.01 and the maximum number of iterations to 100 in our experiments. |