Transformable Convolutional Neural Network for Text Classification
Authors: Liqiang Xiao, Honglun Zhang, Wenqing Chen, Yongkun Wang, Yaohui Jin
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test proposed modules on two state-of-the-art models, and the results demonstrate that our modules can effectively adapt to the feature transformation in text classification.4 Experiments4.1 Datasets |
| Researcher Affiliation | Academia | 1 State Key Lab of Advanced Optical Communication System and Network, Shanghai Jiao Tong University 2 Artificial Intelligence Institute, Shanghai Jiao Tong University 3 Network and Information Center, Shanghai Jiao Tong University {xiaoliqiang, zhanghonglun, wenqingchen, ykw, jinyh}@sjtu.edu.cn |
| Pseudocode | No | The paper includes illustrations of its mechanisms (Figure 1, 2, 3) but does not provide formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions 'https://code.google.com/p/word2vec/' for Word2Vec, which is a third-party tool used, not the authors' own source code for their proposed method. No statement or link is provided for the authors' implementation. |
| Open Datasets | Yes | We extensively evaluate our deformable Conv Nets on 9 datasets, which are collected in different domains with different labels. Their statistics are listed in Table 1. SST-1, SST-2 [Socher et al., 2013] and IMDB [Maas et al., 2011] are about movie reviews... SUBJ2 is a subjectivity dataset... [Pang et al., 2004]; TREC3 dataset... [Li and Roth, 2002] ...They are derived from the raw data published by [Blitzer et al., 2007]. |
| Dataset Splits | Yes | Table 1: Statistics of the text classification datasets. Train, Dev. and Test denote the size of train, development and test set respectively; Voc.: Vocabulary size; Len.: Average sentence length. (Table includes specific numbers for Train, Dev., and Test splits for each dataset) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | For TF-MCCNN, as original, we use word vectors from Word2Vec... Training is implemented through SGD (stochastic gradient descend) with the Adadelta update rule... For TF-DCNN... gradient-based optimization is performed using the Adagrad update rule. (Software names like Word2Vec, SGD, Adadelta, Adagrad are mentioned, but specific version numbers are not provided for any of them.) |
| Experiment Setup | Yes | For training, all involved parameters are randomly initialized from a truncated normal distribution with zero mean and standard deviation. And the learning rates are 10-3, 10-4 for the first 2/3 and last 1/3 iterations. Mini-batch is generated by randomly selecting 50 samples every time from the corpus. Training is implemented through SGD (stochastic gradient descend) with the Adadelta update rule [Zeiler, 2012]. For TF-DCNN, parameters are normally initialized in [0.1, 0.1]... The network is trained with mini-batches of size 50 by back-propagation and the gradient-based optimization is performed using the Adagrad update rule [Duchi et al., 2011]. Learning rates are also set to 10-3, 10-4 for the first 2/3 and last 1/3 iterations. |