Aspect Term Extraction with History Attention and Selective Transformation

Authors: Xin Li, Lidong Bing, Piji Li, Wai Lam, Zhimou Yang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results over four benchmark datasets clearly demonstrate that our framework can outperform all state-of-the-art methods.1
Researcher Affiliation Collaboration Xin Li1, Lidong Bing2, Piji Li1, Wai Lam1, Zhimou Yang3 1Key Laboratory of High Confidence Software Technologies, Ministry of Education (CUHK Sub-Lab), Dept of Systems Engineering & Engineering Management, Chinese University of Hong Kong 2Tencent AI Lab, Shenzhen, China 3College of Information Science and Engineering, Northeastern University, China {lixin, wlam, pjli}@se.cuhk.edu.hk, lyndonbing@tencent.com, yangzhimou@stumail.neu.edu.cn
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code Yes 1Codes are available at https://github.com/lixin4ever/HAST.
Open Datasets Yes To evaluate the effectiveness of the proposed framework for the ATE task, we conduct experiments over four benchmark datasets from the Sem Eval ABSA challenge [Pontiki et al., 2014; Pontiki et al., 2015; Pontiki et al., 2016]. Table 1 shows their statistics.
Dataset Splits Yes With 5-fold cross-validation on the training data of D2, other hyper-parameters are set as follows: dim A h = 100, dim O h = 30; the number of cached historical aspect representations N A is 5; the learning rate of SGD is 0.07.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory) were mentioned for the experimental setup.
Software Dependencies No No specific software dependencies with version numbers were mentioned, only general tools like 'sklearn-crfsuite'.
Experiment Setup Yes With 5-fold cross-validation on the training data of D2, other hyper-parameters are set as follows: dim A h = 100, dim O h = 30; the number of cached historical aspect representations N A is 5; the learning rate of SGD is 0.07. We pre-processed each dataset by lowercasing all words and replace all punctuations with PUNCT. We use pre-trained Glo Ve 840B vectors5 [Pennington et al., 2014] to initialize the word embeddings and the dimension (i.e., dimw) is 300. For out-of-vocabulary words, we randomly sample their embeddings from the uniform distribution U( 0.25, 0.25) as done in [Kim, 2014]. All of the weight matrices except those in LSTMs are initialized from the uniform distribution U( 0.2, 0.2). For the initialization of the matrices in LSTMs, we adopt Glorot Uniform strategy [Glorot and Bengio, 2010]. Besides, all biases are initialized as 0 s. The model is trained with SGD. We apply dropout over the ultimate aspect/opinion features and the input word embeddings of LSTMs. The dropout rates are empirically set as 0.5.