Multiple Positional Self-Attention Network for Text Classification
Authors: Biyun Dai, Jinlong Li, Ruoyi Xu7610-7617
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our sentence embeddings approach Multiple Positional Self-Attention Network (MPSAN), we perform the comparison experiments on sentiment analysis, semantic relatedness and sentence classification tasks. The result shows that our MPSAN outperforms state-of-the-art methods on five datasets and the test accuracy is improved by 0.81%, 0.6% on SST, CR datasets, respectively. |
| Researcher Affiliation | Academia | Biyun Dai, Jinlong Li, Ruoyi Xu School of Data Science University of Science and Technology of China Hefei, Anhui, China {byd, xuruoyi}@mail.ustc.edu.cn, jlli@ustc.edu.cn |
| Pseudocode | No | The paper provides formulas and architectural diagrams but does not include structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that its source code is publicly available, nor does it provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | We evaluate our model on six datasets including one sentiment analysis task and four sentence classification tasks and one semantic relatedness task. For all tasks except semantic relatedness task, we use the pre-trained vectors Glo Ve-6B-300D (Pennington, Socher, and Manning 2014) to initialize word embeddings in our MPSAN. We use Stanford Sentiment Treebank (SST) for sentiment analysis (Socher et al. 2013). TREC (Li and Roth 2002). Customer Reviews (CR) (Hu and Liu 2004). Multi Perspective Question Answering (MPQA) dataset (Wiebe, Wilson, and Cardie 2005). SUBJectivity (SUBJ) dataset (Pang and Lee 2004). Sentences Involving Compositional Knowledge (SICK) dataset (Marelli et al. 2014). |
| Dataset Splits | Yes | SST consists of 8544/1101/2210 (train/dev/test) sentences with five fine-grained labels including very positive, positive, neutral, negative and very negative. |
| Hardware Specification | Yes | All the training progress is completed on a single NVidia GTX-1080Ti GPU card with Tensor Flow-1.4.0. |
| Software Dependencies | Yes | All the training progress is completed on a single NVidia GTX-1080Ti GPU card with Tensor Flow-1.4.0. |
| Experiment Setup | Yes | We set the hidden units number to 300, which is equal to the word embedding size. All weight matrices in our model are initialized by Xavier (Glorot and Bengio 2010) initialization and all biases are zero-initialized. We add dropouts between the layers of our model, and the dropout-ratio is 0.7. All the activation functions σ( ) are Exponential Linear Unit (ELU) if they are not specified. We use the sum of cross-entropy loss and L2 regularization penalty as our loss function, and the L2 regularization decay factor is 10 7. As for learning method, we use Adadelta, which is an optimizer of stochastic gradient descent, to minimize the loss function. The batch size of training is 64 and the learning rate is 0.5. |