reproducibilityindex.ai

Learning Meta Word Embeddings by Unsupervised Weighted Concatenation of Source Embeddings

Authors: Danushka Bollegala

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on multiple benchmark datasets show that the proposed weighted concatenated meta-embedding methods outperform previously proposed meta-embedding learning methods.
Researcher Affiliation	Collaboration	Danushka Bollegala University of Liverpool, Amazon danushka@liverpool.ac.uk
Pseudocode	No	The paper describes methods using text and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code for reproducing the results reported in this paper is publicly available.1 1https://github.com/Liv NLP/meta-concat
Open Datasets	Yes	We used the MTurk-771 as a development dataset... Table 1 shows the Spearman correlation coefficient on two true word similarity datasets Sim Lex and Sim Verb... To evaluate meta-embeddings for Part-of-Speech (Po S) tagging as a downstream task, we initialise an LSTM with pretrained source/meta embeddings and train a Po S tagger using the Co NLL-2003 train dataset... To conduct a fair evaluation, we train Glo Ve (k = 736, σ = 0.1472), SGNS (k = 121, σ = 0.3566) and LSA (k = 119, σ = 0.3521) on the Text8 corpus as the source embeddings... we use evaluation tasks and benchmark datasets used in prior work: (1) word similarity prediction (Word Similarity-353 (WS), Rubenstein-Goodenough (RG), rare words (RW), Multimodal Distributional Semantics (MEN), (2) analogy detection (Google analogies (GL), Microsoft Research syntactic analogy dataset (MSR), (3) sentiment classification (movie reviews (MR), customer reviews (CR), opinion polarity (MPQA)), (4) semantic textual similarity benchmark (STS), (5) textual entailment (Ent) and (6) relatedness (SICK).
Dataset Splits	No	We used the MTurk-771 as a development dataset to estimate the co-occurrence window (set to 5 tokens) and β (set to 3) in our experiments. Hyperparameters for those methods were tuned on MTurk-771 as reported in the Supplementary.
Hardware Specification	Yes	The average run times of SW and DW are ca. 30 min (wall clock time) measured on a EC2 p3.2xlarge instance.
Software Dependencies	No	The paper mentions using LSTM and SVD but does not provide specific version numbers for software dependencies or libraries used for implementation.
Experiment Setup	Yes	We used the MTurk-771 as a development dataset to estimate the co-occurrence window (set to 5 tokens) and β (set to 3) in our experiments. Moreover, α is set to 0.5 for all source embeddings, which reported the best performance on MTurk-771.