reproducibilityindex.ai

Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction

Authors: Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, Ming Zhou

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the Sem Eval datasets show that, (1) with only embedding features, we can achieve state-of-the-art results; (2) our embedding method which incorporates the syntactic information among words yields better performance than other representative ones in aspect term extraction.
Researcher Affiliation	Collaboration	Yichun Yin1, Furu Wei2, Li Dong3, Kaimeng Xu1, Ming Zhang1 , Ming Zhou2 1School of EECS, Peking University 2Microsoft Research 3Institute for Language, Cognition and Computation, University of Edinburgh
Pseudocode	No	The paper describes the model training and feature construction using natural language and mathematical equations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper provides links to the implementations of baseline models used for comparison but does not state that the code for their own proposed methodology is publicly released or provide a link to it.
Open Datasets	Yes	We conduct our experiments on the Sem Eval 2014 and 2015 datasets. The corpora contain Yelp dataset3 and Amazon dataset4 which are in-domain corpora for restaurant domain and laptop domain respectively. 3https://www.yelp.com/academic dataset 4https://snap.stanford.edu/data/web-Amazon.html
Dataset Splits	Yes	In order to choose l and d, we use 80% sentences in training data as training set, and the rest 20% as development set.
Hardware Specification	No	The paper mentions 'asynchronous gradient descent for parallel training' but does not provide specific details on the hardware used, such as GPU/CPU models, memory, or processing units.
Software Dependencies	No	The paper mentions using 'Stanford corenlp' and 'an available CRF tool' (crfsharp.codeplex.com) but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The dimensions of word and dependency path embeddings are set to 100. Larger dimensions get similar results in the development set but need more training time. l is set to 15 that performs best in the development set. We use asynchronous gradient descent for parallel training. Following the strategy for updating learning rate [Mikolov et al., 2013a], we linearly decrease it over our training instances. The initial learning rate is set to 0.001.