Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Spectral Clustering in Heterogeneous Information Networks

Authors: Xiang Li, Ben Kao, Zhaochun Ren, Dawei Yin4221-4228

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments comparing SClump with other state-of-the-art clustering algorithms on HINs. Our results show that SClump outperforms the competitors over a range of datasets w.r.t. different clustering quality measures.
Researcher Affiliation	Collaboration	Xiang Li,1 Ben Kao,2 Zhaochun Ren,1 Dawei Yin1 1Data Science Lab, JD.com, China 2Department of Computer Science, The University of Hong Kong, Hong Kong EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 SClump
Open Source Code	No	The paper does not provide a link to its source code or state that it is open-source.
Open Datasets	Yes	We use three datasets Freebase, DBLP and Yelp in the experiments. Freebase is a knowledge base that models entities and their relationships as a graph. DBLP is a bibliographic network of scientiﬁc publications. Yelp is a business referral service, whose data includes various information of businesses such as customer reviews.
Dataset Splits	No	The paper describes how clustering quality is evaluated (NMI, purity, RI) but does not specify how the datasets were split into training, validation, or test sets for model development or evaluation, which is typical for supervised learning tasks, but less common for unsupervised clustering evaluated on the full dataset.
Hardware Specification	No	The paper does not provide any details regarding the hardware specifications (e.g., CPU, GPU, memory) used for conducting the experiments.
Software Dependencies	No	The paper mentions 'k-means' as a post-processing step and refers to other methods, but it does not specify any software dependencies or their version numbers required to reproduce the work (e.g., specific programming languages, libraries, or frameworks with versions).
Experiment Setup	Yes	For SClump, we set α = 0.1, β = 10 for Yelp-R and α = 0.5, β = 10 for other clustering tasks. Moreover, γ is set according to (Nie et al. 2016).