Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SpEx: A Spectral Approach to Explainable Clustering

Authors: Tal Argov, Tal Wagner

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show the favorable performance of our method compared to baselines on a range of datasets. (...) 4 Experimental Evaluation We evaluate our methods compared to baselines, on eight public real-world datasets of various sizes and dimensions, detailed in Table 1.
Researcher Affiliation Academia Tal Argov Tel Aviv University EMAIL Tal Wagner Tel Aviv University EMAIL
Pseudocode Yes Algorithm 1 SPEX input: Dataset X Rd, graph G(X, E, w), target number of clusters ℓ output: Decision tree T where every internal node is associated with a coordinate j and threshold τ BUILDTREE(X, G, ℓ): T initialize a tree with a single node v j, τ argminj,τ CUTSCORE(X, j, τ) Q initialize a maximum priority for the tree leaves, with priorities given by LEAFSCORE Q.push(v, X, j, τ) while T has less than ℓleaves do v, Xv, jv, τv Q.pop() Associate v with the cut j, τ Split v into two new leaves v L, v R Xv L Sjv,τv(Xv) Xv L Xv \ Xv L j L, τL argminj,τ CUTSCORE(Xv L, j, τ) j R, τR argminj,τ CUTSCORE(Xv R, j, τ) Q.push(v L, Xv L, j L, τL) Q.push(v R, Xv R, j R, τR) return T CUTSCORE(X , j, τ) : if Sj,τ(X) = or Sj,τ(X) = X then return return ψG(Sj,τ(X )) + ψG(Xv \ Sj,τ(X )) LEAFSCORE(v, X , j, τ) : return ψG(X ) CUTSCORE(X , j, τ)
Open Source Code Yes Our code is available online.4 4https://github.com/talargv/Sp Ex
Open Datasets Yes We evaluate our methods compared to baselines, on eight public real-world datasets of various sizes and dimensions, detailed in Table 1.
Dataset Splits Yes Table 1: Datasets. Training set only.
Hardware Specification No Our experiments are run on a standard regular free-tier cloud CPU machine (Google Colab) and can be reproduced on any similar machine; our methods and experiments do not require special computational resources. No additional compute beyond what is described in the paper was used in the course of preparing this paper.
Software Dependencies No The implementation of CART in Scikit-Learn [36] contains a weighted variant different from the standard one described in Section 3.3.
Experiment Setup Yes input: Dataset X Rd, graph G(X, E, w), target number of clusters ℓ output: Decision tree T where every internal node is associated with a coordinate j and threshold τ