reproducibilityindex.ai

Multiple Positional Self-Attention Network for Text Classification

Authors: Biyun Dai, Jinlong Li, Ruoyi Xu7610-7617

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate our sentence embeddings approach Multiple Positional Self-Attention Network (MPSAN), we perform the comparison experiments on sentiment analysis, semantic relatedness and sentence classiﬁcation tasks. The result shows that our MPSAN outperforms state-of-the-art methods on ﬁve datasets and the test accuracy is improved by 0.81%, 0.6% on SST, CR datasets, respectively.
Researcher Affiliation	Academia	Biyun Dai, Jinlong Li, Ruoyi Xu School of Data Science University of Science and Technology of China Hefei, Anhui, China {byd, xuruoyi}@mail.ustc.edu.cn, jlli@ustc.edu.cn
Pseudocode	No	The paper provides formulas and architectural diagrams but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not explicitly state that its source code is publicly available, nor does it provide a link to a code repository for the methodology described.
Open Datasets	Yes	We evaluate our model on six datasets including one sentiment analysis task and four sentence classiﬁcation tasks and one semantic relatedness task. For all tasks except semantic relatedness task, we use the pre-trained vectors Glo Ve-6B-300D (Pennington, Socher, and Manning 2014) to initialize word embeddings in our MPSAN. We use Stanford Sentiment Treebank (SST) for sentiment analysis (Socher et al. 2013). TREC (Li and Roth 2002). Customer Reviews (CR) (Hu and Liu 2004). Multi Perspective Question Answering (MPQA) dataset (Wiebe, Wilson, and Cardie 2005). SUBJectivity (SUBJ) dataset (Pang and Lee 2004). Sentences Involving Compositional Knowledge (SICK) dataset (Marelli et al. 2014).
Dataset Splits	Yes	SST consists of 8544/1101/2210 (train/dev/test) sentences with ﬁve ﬁne-grained labels including very positive, positive, neutral, negative and very negative.
Hardware Specification	Yes	All the training progress is completed on a single NVidia GTX-1080Ti GPU card with Tensor Flow-1.4.0.
Software Dependencies	Yes	All the training progress is completed on a single NVidia GTX-1080Ti GPU card with Tensor Flow-1.4.0.
Experiment Setup	Yes	We set the hidden units number to 300, which is equal to the word embedding size. All weight matrices in our model are initialized by Xavier (Glorot and Bengio 2010) initialization and all biases are zero-initialized. We add dropouts between the layers of our model, and the dropout-ratio is 0.7. All the activation functions σ( ) are Exponential Linear Unit (ELU) if they are not speciﬁed. We use the sum of cross-entropy loss and L2 regularization penalty as our loss function, and the L2 regularization decay factor is 10 7. As for learning method, we use Adadelta, which is an optimizer of stochastic gradient descent, to minimize the loss function. The batch size of training is 64 and the learning rate is 0.5.