Multi-Domain Sentiment Classification Based on Domain-Aware Embedding and Attention
Authors: Yitao Cai, Xiaojun Wan
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation results on public datasets with 16 different domains demonstrate the efficacy of our proposed model, which achieves state-of-the-art performance on the multi-domain sentiment classification task. |
| Researcher Affiliation | Academia | Yitao Cai and Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key Laboratory of Computational Linguistics, Peking University Center for Data Science, Peking University {caiyitao, wanxiaojun}@pku.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our code will be released. |
| Open Datasets | Yes | We use the datasets released by [Liu et al., 2017] for multidomain sentiment classification, which consist of product and movie reviews in 16 different domains. |
| Dataset Splits | Yes | The data in each domain is randomly split into training set, development set and test set according to the proportion of 70%, 10%, 20%. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'Glove vectors', 'Adam optimizer', and 'BERT' but does not specify their version numbers or other ancillary software details. |
| Experiment Setup | Yes | We initialize word embedding with 200-dimension Glove vectors [Pennington et al., 2014]. The word embedding is fixed during training. Other parameters are initialized by sampling from normal distribution whose standard deviation is 0.1. The minibatch is 128. Each batch contains 8 samples from every domain. We use Adam optimizer [Kingma and Ba, 2014] with an initial learning rate of 0.004. The hidden size of each LSTM is 128. Weights γd and γs of domain classification loss and sentiment classification loss are set to 0.1 and 1 respectively after a small grid search over [1, 0.1, 0.05]. To alleviate overfitting, we use dropout with probability of 0.5 and L2 regularization with parameter of 1e-8. We train the domain classifier and the sentiment classifier jointly at first. After 10 epochs, we only train the sentiment classifier, setting γd to 0. At last, we finetune our model on each task. |