Unsupervised Neural Aspect Extraction with Sememes

Authors: Ling Luo, Xiang Ao, Yan Song, Jinyao Li, Xiaopeng Yang, Qing He, Dong Yu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two real-world datasets demonstrate the validity and the effectiveness of our models, which significantly outperforms existing baselines.
Researcher Affiliation Collaboration Ling Luo1,5 , Xiang Ao1,5 , Yan Song2 , Jinyao Li3,5 , Xiaopeng Yang4 , Qing He1,5 and Dong Yu2 1Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2Tencent AI Lab 3Institute of Software, Chinese Academy of Sciences 4David R. Cheriton School of Computer Science, Faculty of Mathematics, University of Waterloo 5University of Chinese Academy of Sciences
Pseudocode No The paper describes the models with equations and diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link regarding the release of its source code.
Open Datasets Yes Citysearch corpus. It contains over 50, 000 restaurant reviews from Citysearch New York. An annotated subset with 3, 400 sentences are used for evaluation [Ganu et al., 2009] Beer Advocate. This is a beer review corpus provided in [Mc Auley et al., 2012] containing more than 1.5 million reviews.
Dataset Splits No The paper mentions using annotated subsets for evaluation and discusses a "test set" in qualitative analysis, but it does not provide specific training/validation/test dataset splits (e.g., percentages or counts) or reference a predefined split that would allow for reproduction of the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using "NLTK pos-tagger", "word2vec", "Word Net", and "Adam" as the optimizer, but does not specify any software names with version numbers required for reproduction.
Experiment Setup Yes In the preprocessing, punctuations, stop words and words appearing less than 10 times in the corpus are removed. The word vocabulary size is set to 9, 000 for Restaurant and 11, 000 for Beer. We initialize the word embedding matrix by word2vec trained on the experimental datasets, and the embedding size is set to 200. Word embeddings are fixed during training. Adam is employed as the optimizer with learning rate of 0.001. Orthogonality penalty weight λ is set to 2 on Restaurant and 2.5 on Beer, respectively. For both datasets, the number of negative samples q is 20, the dimensions of S, Si, S i are 200 and the hidden size of RNN structure hrnn is 500.