reproducibilityindex.ai

TextGTL: Graph-based Transductive Learning for Semi-supervised Text Classification via Structure-Sensitive Interpolation

Authors: Chen Li, Xutan Peng, Hao Peng, Jianxin Li, Lihong Wang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the effectiveness of Text GTL, we conduct extensive experiments on various benchmark datasets, observing signiﬁcant performance gains over conventional heterogeneous graphs. In addition, we also design ablation studies to dive deep into the validity of components in Text TGL.
Researcher Affiliation	Academia	Chen Li1,2 , Xutan Peng3 , Hao Peng1,2 , Jianxin Li1,2 and Lihong Wang4 11Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, China 2State Key Laboratory of Software Development Environment, Beihang University, China 3Department of Computer Science, The University of Shefﬁeld, UK 4National Computer Network Emergency Response Technical Team/Coordination Center of China, China
Pseudocode	No	The paper describes the proposed framework and its components using textual descriptions and mathematical equations, but it does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code	No	The paper does not include any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	We conduct extensive experiments on 5 benchmark text datasets: 20-Newsgroups dataset, Ohsumed dataset, R52 Reuters dataset, and R8 Reuters dataset, and Movie Review dataset.
Dataset Splits	Yes	We split the same number of samples of the veriﬁcation set as the training set from the dataset, and ensure that there is no sample imbalance in the testing set.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The initial features of the ﬁrst-layer nodes in the three different text graphs are all TF-IDF attributes from Tf Idfcounter in sklearn, the output feature dimension is 200, the input feature dimension of the second-layer nodes is 200 × 3, and the output feature is the number of classes. During the training process, the dropout rate is 0.5, and the L2 loss weight is 5e-6. We use Adam optimizer with a maximum of 200 epochs and a learning rate of 0.002, when the veriﬁcation loss is not reduced for 10 consecutive epochs, an early stop will be executed.
Experiment Setup	Yes	We use K = 4 kinds of document initial embeddings in the semantics text graph that are derived from [P orner and Sch utze, 2019], set the length of sliding-window as 20, and the dependency parser used in the syntax text graph is Stanford Core NLP. We equip Text GTL with the Ego Splitting [Epasto et al., 2017] as an overlapping clustering algorithm (resolution=1.0). During data augmentation, we select GBDT as a simple classiﬁer, and iterated 10 rounds to obtain more labeled super nodes to add to the training set. As mentioned in section 3.3, in this study, we adopt a two-layer GCN to achieve. The initial features of the ﬁrst-layer nodes in the three different text graphs are all TF-IDF attributes from Tf Idfcounter in sklearn, the output feature dimension is 200, the input feature dimension of the second-layer nodes is 200 × 3, and the output feature is the number of classes. During the training process, the dropout rate is 0.5, and the L2 loss weight is 5e-6. We use Adam optimizer with a maximum of 200 epochs and a learning rate of 0.002, when the veriﬁcation loss is not reduced for 10 consecutive epochs, an early stop will be executed.