Differentiated Attentive Representation Learning for Sentence Classification

Authors: Qianrong Zhou, Xiaojie Wang, Xuan Dong

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on real and synthetic datasets demonstrate the effectiveness of our model. In this section, we empirically evaluate the performance of DARLM and compare it with the state-of-the-art models.
Researcher Affiliation Academia Qianrong Zhou, Xiaojie Wang, Xuan Dong Center for Intelligence Science and Technology, School of Computer Science, Beijing University of Posts and Telecommunications {zhouqr,xjwang,dongxuan8811}@bupt.edu.cn
Pseudocode No The paper describes its model components and training process in text and mathematical equations, but there is no pseudocode block or algorithm listing.
Open Source Code Yes The codes and datasets are publicly available at https://github.com/Chanrom/DARLM.
Open Datasets Yes SST is a popular sentiment classification dataset introduced by Socher et al. [2013]. TREC is a question type classification dataset [Li and Roth, 2002], where questions are labeled with six classes. SUBJ is a subjectivity dataset where each snippet can be classified as subjective or objective. [Pang and Lee, 2004]. MR is a movie reviews with positive/negative labels [Pang and Lee, 2005].
Dataset Splits Yes Standard train/dev/test split is used. TREC... We randomly split 500 questions in the training set into a development set. ... We follow the same split as [Liu et al., 2017] on above two datasets.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. It mentions concepts like CNN and RMSProp optimizer, but no specific software versions.
Experiment Setup Yes The word embedding size, LSTM hidden size and number of hidden units inside all fully connected layers are set to 300. Convolution window sizes are 3, 4 and 5, and each window size has 100 filters. The word embeddings are initialized with the pre-trained GloVe vectors [Pennington et al., 2014] and fine-tuned during training. Other parameters are initialized from a uniform distribution in [-0.1, 0.1]. For regularization, we apply dropout [Srivastava et al., 2014] with a dropout rate of 0.5 to all layers (except those in example discriminator) and batch normalization to the outputs of one-layer CNN. The model is trained using mini-batch stochastic gradient descent with the RMSProp optimizer in a total of 30 epochs. The initial learning rate is set to 0.0005 and mini-batch size is 16. The hyper-parameter a is set to 1 for all experiments, while b is estimates by grid search across the set {2, 3, 4, 5}. For the coefficient λ, we empirically set it to a positive or negative number for different datasets.