reproducibilityindex.ai

Unsupervised Neural Aspect Extraction with Sememes

Authors: Ling Luo, Xiang Ao, Yan Song, Jinyao Li, Xiaopeng Yang, Qing He, Dong Yu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two real-world datasets demonstrate the validity and the effectiveness of our models, which significantly outperforms existing baselines.
Researcher Affiliation	Collaboration	Ling Luo1,5 , Xiang Ao1,5 , Yan Song2 , Jinyao Li3,5 , Xiaopeng Yang4 , Qing He1,5 and Dong Yu2 1Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2Tencent AI Lab 3Institute of Software, Chinese Academy of Sciences 4David R. Cheriton School of Computer Science, Faculty of Mathematics, University of Waterloo 5University of Chinese Academy of Sciences
Pseudocode	No	The paper describes the models with equations and diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link regarding the release of its source code.
Open Datasets	Yes	Citysearch corpus. It contains over 50, 000 restaurant reviews from Citysearch New York. An annotated subset with 3, 400 sentences are used for evaluation [Ganu et al., 2009] Beer Advocate. This is a beer review corpus provided in [Mc Auley et al., 2012] containing more than 1.5 million reviews.
Dataset Splits	No	The paper mentions using annotated subsets for evaluation and discusses a "test set" in qualitative analysis, but it does not provide specific training/validation/test dataset splits (e.g., percentages or counts) or reference a predefined split that would allow for reproduction of the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using "NLTK pos-tagger", "word2vec", "Word Net", and "Adam" as the optimizer, but does not specify any software names with version numbers required for reproduction.
Experiment Setup	Yes	In the preprocessing, punctuations, stop words and words appearing less than 10 times in the corpus are removed. The word vocabulary size is set to 9, 000 for Restaurant and 11, 000 for Beer. We initialize the word embedding matrix by word2vec trained on the experimental datasets, and the embedding size is set to 200. Word embeddings are fixed during training. Adam is employed as the optimizer with learning rate of 0.001. Orthogonality penalty weight λ is set to 2 on Restaurant and 2.5 on Beer, respectively. For both datasets, the number of negative samples q is 20, the dimensions of S, Si, S i are 200 and the hidden size of RNN structure hrnn is 500.