Graph Convolutional Networks for Text Classification
Authors: Liang Yao, Chengsheng Mao, Yuan Luo7370-7377
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification. |
| Researcher Affiliation | Academia | Liang Yao, Chengsheng Mao, Yuan Luo Northwestern University Chicago IL 60611 {liang.yao, chengsheng.mao, yuan.luo}@northwestern.edu |
| Pseudocode | No | The paper includes mathematical equations and a schematic diagram (Figure 1), but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code is available at https://github.com/yao8839836/text_gcn. |
| Open Datasets | Yes | We ran our experiments on five widely used benchmark corpora including 20-Newsgroups (20NG), Ohsumed, R52 and R8 of Reuters 21578 and Movie Review (MR). (Footnotes with URLs provided for each dataset: 1http://qwone.com/~jason/20Newsgroups/, 2http://disi.unitn.it/moschitti/corpora.htm, 3https://www.cs.umb.edu/~smimarog/textmining/datasets/, 4http://www.cs.cornell.edu/people/pabo/movie-review-data/, 5https://github.com/mnqu/PTE/tree/master/data/mr) |
| Dataset Splits | Yes | We randomly selected 10% of training set as validation set. ... The 20NG dataset... In total, 11,314 documents are in the training set and 7,532 documents are in the test set. ... 3,357 documents are in the training set and 4,043 documents are in the test set [Ohsumed]. ... R8 has 8 categories, and was split to 5,485 training and 2,189 test documents. ... R52 has 52 categories, and was split to 6,532 training and 2,568 test documents. ... We used the training/test split in (Tang, Qu, and Mei 2015) for MR. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using NLTK for stop word removal and Adam for optimization, but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For Text GCN, we set the embedding size of the first convolution layer as 200 and set the window size as 20. We tuned other parameters and set the learning rate as 0.02, dropout rate as 0.5, L2 loss weight as 0. We trained Text GCN for a maximum of 200 epochs using Adam (Kingma and Ba 2015) and stop training if the validation loss does not decrease for 10 consecutive epochs. |