TextGTL: Graph-based Transductive Learning for Semi-supervised Text Classification via Structure-Sensitive Interpolation

Authors: Chen Li, Xutan Peng, Hao Peng, Jianxin Li, Lihong Wang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of Text GTL, we conduct extensive experiments on various benchmark datasets, observing significant performance gains over conventional heterogeneous graphs. In addition, we also design ablation studies to dive deep into the validity of components in Text TGL.
Researcher Affiliation Academia Chen Li1,2 , Xutan Peng3 , Hao Peng1,2 , Jianxin Li1,2 and Lihong Wang4 11Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, China 2State Key Laboratory of Software Development Environment, Beihang University, China 3Department of Computer Science, The University of Sheffield, UK 4National Computer Network Emergency Response Technical Team/Coordination Center of China, China
Pseudocode No The paper describes the proposed framework and its components using textual descriptions and mathematical equations, but it does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code No The paper does not include any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes We conduct extensive experiments on 5 benchmark text datasets: 20-Newsgroups dataset, Ohsumed dataset, R52 Reuters dataset, and R8 Reuters dataset, and Movie Review dataset.
Dataset Splits Yes We split the same number of samples of the verification set as the training set from the dataset, and ensure that there is no sample imbalance in the testing set.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The initial features of the first-layer nodes in the three different text graphs are all TF-IDF attributes from Tf Idfcounter in sklearn, the output feature dimension is 200, the input feature dimension of the second-layer nodes is 200 × 3, and the output feature is the number of classes. During the training process, the dropout rate is 0.5, and the L2 loss weight is 5e-6. We use Adam optimizer with a maximum of 200 epochs and a learning rate of 0.002, when the verification loss is not reduced for 10 consecutive epochs, an early stop will be executed.
Experiment Setup Yes We use K = 4 kinds of document initial embeddings in the semantics text graph that are derived from [P orner and Sch utze, 2019], set the length of sliding-window as 20, and the dependency parser used in the syntax text graph is Stanford Core NLP. We equip Text GTL with the Ego Splitting [Epasto et al., 2017] as an overlapping clustering algorithm (resolution=1.0). During data augmentation, we select GBDT as a simple classifier, and iterated 10 rounds to obtain more labeled super nodes to add to the training set. As mentioned in section 3.3, in this study, we adopt a two-layer GCN to achieve. The initial features of the first-layer nodes in the three different text graphs are all TF-IDF attributes from Tf Idfcounter in sklearn, the output feature dimension is 200, the input feature dimension of the second-layer nodes is 200 × 3, and the output feature is the number of classes. During the training process, the dropout rate is 0.5, and the L2 loss weight is 5e-6. We use Adam optimizer with a maximum of 200 epochs and a learning rate of 0.002, when the verification loss is not reduced for 10 consecutive epochs, an early stop will be executed.