Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Text Classification with Heterogeneous Information Network Kernels

Authors: Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, Jiawei Han

AAAI 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using Freebase, a well-known world knowledge base, to construct HIN for texts, our experiments on two benchmark datasets show that the indeﬁnite HIN-kernel based on weighted meta-paths outperforms the state-of-the-art methods and other HIN-kernels. and Experiments In this section, we show empirically how to incorporate external knowledge into the HIN-kernels.
Researcher Affiliation	Academia	Chenguang Wanga, Yangqiu Songb, Haoran Lia, Ming Zhanga, Jiawei Hanc a School of EECS, Peking University b Lane Department of Computer Science and Electrical Engineering, West Virginia University c Department of Computer Science, University of Illinois at Urbana-Champaign
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Datasets We derive four classiﬁcation problems from the two benchmark datasets as follow. 20Newsgroups (20NG): In the spirit of (Basu, Bilenko, and Mooney 2004), two datasets are created by selecting three categories from 20NG. RCV1: We derive two subsets of RCV1 (Lewis et al. 2004) from the top category GCAT (Government/Social).
Dataset Splits	Yes	Each data split has three binary classiﬁcation tasks. For each task, the corresponding data is randomly divided into 80% training and 20% testing data. We apply 5fold cross validation on the training set to determine the optimal hyperparameter C for SVM and SVMHIN.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions Word2Vec, Naive Bayes, and SVM, but does not specify version numbers for any software libraries or dependencies used for implementation.
Experiment Setup	Yes	The parameters C and ρ for indeﬁnite SVM are tuned based on the 5-fold cross validation and the Nesterov s efﬁcient smooth optimization method (Nesterov 2005) is terminated if the value of the object function changes less than 10 6 following (Ying, Campbell, and Girolami 2009).