Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Balanced Ranking with Relative Centrality: A multi-core periphery perspective

Authors: Chandra Sekhar Mukherjee, Jiapeng Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical and extensive simulation support for our approach towards resolving the unbalancedness in MCPC. Finally, we consider graph embeddings of 11 single-cell datasets. We observe that top-ranked as per existing centrality measures are better separable into the ground truth communities. However, due to the unbalanced ranking, the top nodes often do not contain points from some communities. Here, our relative-centrality-based approach generates a ranking that provides a similar improvement in clusterability while providing significantly higher balancedness.
Researcher Affiliation	Academia	Chandra Sekhar Mukherjee Thomas Lord Department of Computer Science University of Southern California EMAIL; Jiapeng Zhang Thomas Lord Department of Computer Science University of Southern California EMAIL
Pseudocode	Yes	Algorithm 1: Neighbor Rank (N-Rank) with t-step initialization; Algorithm 2 A meta generalization: Meta-Relative-Rank (t, y, z)
Open Source Code	Yes	We have shared our codes for the simulation and real-world data in the supplementary material. The simulation experiments can be run using the simulation.ipynb file, and is self-contained (needed modules are provided in the zip). Due to the large size of the real-world vector datasets, we are unable to share them, but we have shared the code used to run the experiments.
Open Datasets	Yes	We use the 7 datasets from a recent database (Abdelaal et al., 2019), the popular Zheng8eq dataset (Duò et al., 2018), and two more large datasets (Smith et al., 2019), and a T-cell dataset (Savas et al., 2018) of cancer patients. All of these datasets have annotated labels available of their corresponding cell types that form the underlying communities.
Dataset Splits	No	The paper describes selecting a 'c-fraction' of top-ranked points (e.g., c=0.2) and applying clustering to the induced subgraph. This is a selection process for analysis rather than a traditional train/test/validation split for a machine learning model.
Hardware Specification	No	The paper does not explicitly mention any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper states that 'needed modules are provided in the zip' for simulation experiments but does not list specific software libraries or tools with their version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup	Yes	For each dataset, we first log-normalize it and then apply PCA dimensionality reduction of dimension 50, which is a standard pipeline in the single-cell analysis literature (Duò et al., 2018). Then, we obtain its 20-NN graph embedding, which we denote as G0. We set c = 0.2 (the results are robust to the choice of the cutoff point). In our experiments, we set t = 1 for the graphs generated by the MCPC block model and t = log \|V \| for both concentric GMM as well as real-world experiments.