reproducibilityindex.ai

Graph Convolutional Networks for Text Classification

Authors: Liang Yao, Chengsheng Mao, Yuan Luo7370-7377

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classiﬁcation. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classiﬁcation.
Researcher Affiliation	Academia	Liang Yao, Chengsheng Mao, Yuan Luo Northwestern University Chicago IL 60611 {liang.yao, chengsheng.mao, yuan.luo}@northwestern.edu
Pseudocode	No	The paper includes mathematical equations and a schematic diagram (Figure 1), but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is available at https://github.com/yao8839836/text_gcn.
Open Datasets	Yes	We ran our experiments on ﬁve widely used benchmark corpora including 20-Newsgroups (20NG), Ohsumed, R52 and R8 of Reuters 21578 and Movie Review (MR). (Footnotes with URLs provided for each dataset: 1http://qwone.com/~jason/20Newsgroups/, 2http://disi.unitn.it/moschitti/corpora.htm, 3https://www.cs.umb.edu/~smimarog/textmining/datasets/, 4http://www.cs.cornell.edu/people/pabo/movie-review-data/, 5https://github.com/mnqu/PTE/tree/master/data/mr)
Dataset Splits	Yes	We randomly selected 10% of training set as validation set. ... The 20NG dataset... In total, 11,314 documents are in the training set and 7,532 documents are in the test set. ... 3,357 documents are in the training set and 4,043 documents are in the test set [Ohsumed]. ... R8 has 8 categories, and was split to 5,485 training and 2,189 test documents. ... R52 has 52 categories, and was split to 6,532 training and 2,568 test documents. ... We used the training/test split in (Tang, Qu, and Mei 2015) for MR.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using NLTK for stop word removal and Adam for optimization, but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	For Text GCN, we set the embedding size of the ﬁrst convolution layer as 200 and set the window size as 20. We tuned other parameters and set the learning rate as 0.02, dropout rate as 0.5, L2 loss weight as 0. We trained Text GCN for a maximum of 200 epochs using Adam (Kingma and Ba 2015) and stop training if the validation loss does not decrease for 10 consecutive epochs.