DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Authors: Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, Chengqi Zhang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments1, we compare Di SAN with the currently popular methods on various NLP tasks, e.g., natural language inference, sentiment analysis, sentence classification, etc. Di SAN achieves the highest test accuracy on the Stanford Natural Language Inference (SNLI) dataset among sentence-encoding models and improves the currently best result by 1.02%.
Researcher Affiliation Academia Centre of Artificial Intelligence, FEIT, University of Technology Sydney Paul G. Allen School of Computer Science & Engineering, University of Washington
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It provides architectural diagrams but no step-by-step algorithmic descriptions.
Open Source Code Yes Codes and pre-trained models for experiments can be found at https://github.com/taoshen58/Di SAN
Open Datasets Yes We compare different models on a widely used benchmark, Stanford Natural Language Inference (SNLI)3 (Bowman et al. 2015) dataset... We use Stanford Sentiment Treebank (SST)4 (Socher et al. 2013)... Multi-Genre Natural Language Inference (Multi NLI)5 (Williams, Nangia, and Bowman 2017) dataset... Sentences Involving Compositional Knowledge (SICK)7 dataset (Marelli et al. 2014)... CR: Customer review (Hu and Liu 2004); MPQA: Opinion polarity detection subtask of the MPQA dataset (Wiebe, Wilson, and Cardie 2005); SUBJ: Subjectivity dataset (Pang and Lee 2004); TREC: TREC question-type classification dataset (Li and Roth 2002).
Dataset Splits Yes Stanford Natural Language Inference (SNLI)3 (Bowman et al. 2015) dataset, which consists of 549,367/9,842/9,824 (train/dev/test) premise-hypothesis pairs with labels... Stanford Sentiment Treebank (SST)4 (Socher et al. 2013)... with 8,544/1,101/2,210 samples... SICK is composed of 9,927 sentence pairs with 4,500/500/4,927 instances for train/dev/test.
Hardware Specification Yes All models are implemented with Tensor Flow2 and run on single Nvidia GTX 1080Ti graphic card.
Software Dependencies No The paper states 'All models are implemented with Tensor Flow2' but does not provide a specific version number for TensorFlow or any other software dependencies.
Experiment Setup Yes Training Setup: We use cross-entropy loss plus L2 regularization penalty as optimization objective. We minimize it by Adadelta (Zeiler 2012) (an optimizer of mini-batch SGD) with batch size of 64. We use Adadelta rather than Adam (Kingma and Ba 2015) because in our experiments, Di SAN optimized by Adadelta can achieve more stable performance than Adam optimized one. Initial learning rate is set to 0.5. All weight matrices are initialized by Glorot Initialization (Glorot and Bengio 2010), and the biases are initialized with 0. ... We use Dropout (Srivastava et al. 2014) with keep probability 0.75 for language inference and 0.8 for sentiment analysis. The L2 regularization decay factors γ are 5 × 10−5 and 10−4 for language inference and sentiment analysis, respectively. ... Hidden units number dh is set to 300. Activation functions σ( ) are ELU (exponential linear unit) (Clevert, Unterthiner, and Hochreiter 2016) if not specified.