Aspect Term Extraction with History Attention and Selective Transformation
Authors: Xin Li, Lidong Bing, Piji Li, Wai Lam, Zhimou Yang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results over four benchmark datasets clearly demonstrate that our framework can outperform all state-of-the-art methods.1 |
| Researcher Affiliation | Collaboration | Xin Li1, Lidong Bing2, Piji Li1, Wai Lam1, Zhimou Yang3 1Key Laboratory of High Confidence Software Technologies, Ministry of Education (CUHK Sub-Lab), Dept of Systems Engineering & Engineering Management, Chinese University of Hong Kong 2Tencent AI Lab, Shenzhen, China 3College of Information Science and Engineering, Northeastern University, China {lixin, wlam, pjli}@se.cuhk.edu.hk, lyndonbing@tencent.com, yangzhimou@stumail.neu.edu.cn |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | 1Codes are available at https://github.com/lixin4ever/HAST. |
| Open Datasets | Yes | To evaluate the effectiveness of the proposed framework for the ATE task, we conduct experiments over four benchmark datasets from the Sem Eval ABSA challenge [Pontiki et al., 2014; Pontiki et al., 2015; Pontiki et al., 2016]. Table 1 shows their statistics. |
| Dataset Splits | Yes | With 5-fold cross-validation on the training data of D2, other hyper-parameters are set as follows: dim A h = 100, dim O h = 30; the number of cached historical aspect representations N A is 5; the learning rate of SGD is 0.07. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory) were mentioned for the experimental setup. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned, only general tools like 'sklearn-crfsuite'. |
| Experiment Setup | Yes | With 5-fold cross-validation on the training data of D2, other hyper-parameters are set as follows: dim A h = 100, dim O h = 30; the number of cached historical aspect representations N A is 5; the learning rate of SGD is 0.07. We pre-processed each dataset by lowercasing all words and replace all punctuations with PUNCT. We use pre-trained Glo Ve 840B vectors5 [Pennington et al., 2014] to initialize the word embeddings and the dimension (i.e., dimw) is 300. For out-of-vocabulary words, we randomly sample their embeddings from the uniform distribution U( 0.25, 0.25) as done in [Kim, 2014]. All of the weight matrices except those in LSTMs are initialized from the uniform distribution U( 0.2, 0.2). For the initialization of the matrices in LSTMs, we adopt Glorot Uniform strategy [Glorot and Bengio, 2010]. Besides, all biases are initialized as 0 s. The model is trained with SGD. We apply dropout over the ultimate aspect/opinion features and the input word embeddings of LSTMs. The dropout rates are empirically set as 0.5. |