Transformable Convolutional Neural Network for Text Classification

Authors: Liqiang Xiao, Honglun Zhang, Wenqing Chen, Yongkun Wang, Yaohui Jin

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test proposed modules on two state-of-the-art models, and the results demonstrate that our modules can effectively adapt to the feature transformation in text classification.4 Experiments4.1 Datasets
Researcher Affiliation Academia 1 State Key Lab of Advanced Optical Communication System and Network, Shanghai Jiao Tong University 2 Artificial Intelligence Institute, Shanghai Jiao Tong University 3 Network and Information Center, Shanghai Jiao Tong University {xiaoliqiang, zhanghonglun, wenqingchen, ykw, jinyh}@sjtu.edu.cn
Pseudocode No The paper includes illustrations of its mechanisms (Figure 1, 2, 3) but does not provide formal pseudocode or algorithm blocks.
Open Source Code No The paper mentions 'https://code.google.com/p/word2vec/' for Word2Vec, which is a third-party tool used, not the authors' own source code for their proposed method. No statement or link is provided for the authors' implementation.
Open Datasets Yes We extensively evaluate our deformable Conv Nets on 9 datasets, which are collected in different domains with different labels. Their statistics are listed in Table 1. SST-1, SST-2 [Socher et al., 2013] and IMDB [Maas et al., 2011] are about movie reviews... SUBJ2 is a subjectivity dataset... [Pang et al., 2004]; TREC3 dataset... [Li and Roth, 2002] ...They are derived from the raw data published by [Blitzer et al., 2007].
Dataset Splits Yes Table 1: Statistics of the text classification datasets. Train, Dev. and Test denote the size of train, development and test set respectively; Voc.: Vocabulary size; Len.: Average sentence length. (Table includes specific numbers for Train, Dev., and Test splits for each dataset)
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No For TF-MCCNN, as original, we use word vectors from Word2Vec... Training is implemented through SGD (stochastic gradient descend) with the Adadelta update rule... For TF-DCNN... gradient-based optimization is performed using the Adagrad update rule. (Software names like Word2Vec, SGD, Adadelta, Adagrad are mentioned, but specific version numbers are not provided for any of them.)
Experiment Setup Yes For training, all involved parameters are randomly initialized from a truncated normal distribution with zero mean and standard deviation. And the learning rates are 10-3, 10-4 for the first 2/3 and last 1/3 iterations. Mini-batch is generated by randomly selecting 50 samples every time from the corpus. Training is implemented through SGD (stochastic gradient descend) with the Adadelta update rule [Zeiler, 2012]. For TF-DCNN, parameters are normally initialized in [0.1, 0.1]... The network is trained with mini-batches of size 50 by back-propagation and the gradient-based optimization is performed using the Adagrad update rule [Duchi et al., 2011]. Learning rates are also set to 10-3, 10-4 for the first 2/3 and last 1/3 iterations.