Lifelong Domain Word Embedding via Meta-Learning

Authors: Hu Xu, Bing Liu, Lei Shu, Philip S. Yu

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks. We use the Amazon Review datasets from [He and Mc Auley, 2016], which is a collection of multiple-domain corpora. Table 2 shows the main results. We observe that the proposed method L-DEM 200D + ND 30M performs the best.
Researcher Affiliation Academia 1Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA 2Institute for Data Science, Tsinghua University, Beijing, China {hxu48, liub, lshu3, psyu}@uic.edu
Pseudocode Yes Algorithm 1: Identifying Context Words from the Past
Open Source Code No The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets Yes We use the Amazon Review datasets from [He and Mc Auley, 2016], which is a collection of multiple-domain corpora.
Dataset Splits Yes We split the 56 domains into 39 domains for training, 5 domains for validation and 12 domains for testing. We select 3500 examples for training, 500 examples for validation and 2000 examples for testing.
Hardware Specification No The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for the experiments. It only mentions "limited computing resource".
Software Dependencies No The paper mentions software components like "skip-gram model", "Adam optimizer", and "Bi-LSTM model" but does not provide specific version numbers for any of the software or libraries used in implementation.
Experiment Setup Yes We set the size of a context window to be 5 when building feature vectors. We use the default hyperparameters of skip-gram model [Mikolov et al., 2013b] to train the domain embeddings. We apply dropout rate of 0.5 on all layers except the last one and use Adam [Kingma and Ba, 2014] as the optimizer. we empirically set δ = 0.7 as the threshold on the similarity score in Algorithm 1