Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SpEx: A Spectral Approach to Explainable Clustering

Authors: Tal Argov, Tal Wagner

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show the favorable performance of our method compared to baselines on a range of datasets. (...) 4 Experimental Evaluation We evaluate our methods compared to baselines, on eight public real-world datasets of various sizes and dimensions, detailed in Table 1.
Researcher Affiliation	Academia	Tal Argov Tel Aviv University EMAIL Tal Wagner Tel Aviv University EMAIL
Pseudocode	Yes	Algorithm 1 SPEX input: Dataset X Rd, graph G(X, E, w), target number of clusters ℓ output: Decision tree T where every internal node is associated with a coordinate j and threshold τ BUILDTREE(X, G, ℓ): T initialize a tree with a single node v j, τ argminj,τ CUTSCORE(X, j, τ) Q initialize a maximum priority for the tree leaves, with priorities given by LEAFSCORE Q.push(v, X, j, τ) while T has less than ℓleaves do v, Xv, jv, τv Q.pop() Associate v with the cut j, τ Split v into two new leaves v L, v R Xv L Sjv,τv(Xv) Xv L Xv \ Xv L j L, τL argminj,τ CUTSCORE(Xv L, j, τ) j R, τR argminj,τ CUTSCORE(Xv R, j, τ) Q.push(v L, Xv L, j L, τL) Q.push(v R, Xv R, j R, τR) return T CUTSCORE(X , j, τ) : if Sj,τ(X) = or Sj,τ(X) = X then return return ψG(Sj,τ(X )) + ψG(Xv \ Sj,τ(X )) LEAFSCORE(v, X , j, τ) : return ψG(X ) CUTSCORE(X , j, τ)
Open Source Code	Yes	Our code is available online.4 4https://github.com/talargv/Sp Ex
Open Datasets	Yes	We evaluate our methods compared to baselines, on eight public real-world datasets of various sizes and dimensions, detailed in Table 1.
Dataset Splits	Yes	Table 1: Datasets. Training set only.
Hardware Specification	No	Our experiments are run on a standard regular free-tier cloud CPU machine (Google Colab) and can be reproduced on any similar machine; our methods and experiments do not require special computational resources. No additional compute beyond what is described in the paper was used in the course of preparing this paper.
Software Dependencies	No	The implementation of CART in Scikit-Learn [36] contains a weighted variant different from the standard one described in Section 3.3.
Experiment Setup	Yes	input: Dataset X Rd, graph G(X, E, w), target number of clusters ℓ output: Decision tree T where every internal node is associated with a coordinate j and threshold τ