Graph Convolutional Networks for Text Classification

Authors: Liang Yao, Chengsheng Mao, Yuan Luo7370-7377

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.
Researcher Affiliation Academia Liang Yao, Chengsheng Mao, Yuan Luo Northwestern University Chicago IL 60611 {liang.yao, chengsheng.mao, yuan.luo}@northwestern.edu
Pseudocode No The paper includes mathematical equations and a schematic diagram (Figure 1), but no structured pseudocode or algorithm blocks.
Open Source Code Yes Our source code is available at https://github.com/yao8839836/text_gcn.
Open Datasets Yes We ran our experiments on five widely used benchmark corpora including 20-Newsgroups (20NG), Ohsumed, R52 and R8 of Reuters 21578 and Movie Review (MR). (Footnotes with URLs provided for each dataset: 1http://qwone.com/~jason/20Newsgroups/, 2http://disi.unitn.it/moschitti/corpora.htm, 3https://www.cs.umb.edu/~smimarog/textmining/datasets/, 4http://www.cs.cornell.edu/people/pabo/movie-review-data/, 5https://github.com/mnqu/PTE/tree/master/data/mr)
Dataset Splits Yes We randomly selected 10% of training set as validation set. ... The 20NG dataset... In total, 11,314 documents are in the training set and 7,532 documents are in the test set. ... 3,357 documents are in the training set and 4,043 documents are in the test set [Ohsumed]. ... R8 has 8 categories, and was split to 5,485 training and 2,189 test documents. ... R52 has 52 categories, and was split to 6,532 training and 2,568 test documents. ... We used the training/test split in (Tang, Qu, and Mei 2015) for MR.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions using NLTK for stop word removal and Adam for optimization, but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For Text GCN, we set the embedding size of the first convolution layer as 200 and set the window size as 20. We tuned other parameters and set the learning rate as 0.02, dropout rate as 0.5, L2 loss weight as 0. We trained Text GCN for a maximum of 200 epochs using Adam (Kingma and Ba 2015) and stop training if the validation loss does not decrease for 10 consecutive epochs.