Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SpEx: A Spectral Approach to Explainable Clustering
Authors: Tal Argov, Tal Wagner
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show the favorable performance of our method compared to baselines on a range of datasets. (...) 4 Experimental Evaluation We evaluate our methods compared to baselines, on eight public real-world datasets of various sizes and dimensions, detailed in Table 1. |
| Researcher Affiliation | Academia | Tal Argov Tel Aviv University EMAIL Tal Wagner Tel Aviv University EMAIL |
| Pseudocode | Yes | Algorithm 1 SPEX input: Dataset X Rd, graph G(X, E, w), target number of clusters ℓ output: Decision tree T where every internal node is associated with a coordinate j and threshold τ BUILDTREE(X, G, ℓ): T initialize a tree with a single node v j, τ argminj,τ CUTSCORE(X, j, τ) Q initialize a maximum priority for the tree leaves, with priorities given by LEAFSCORE Q.push(v, X, j, τ) while T has less than ℓleaves do v, Xv, jv, τv Q.pop() Associate v with the cut j, τ Split v into two new leaves v L, v R Xv L Sjv,τv(Xv) Xv L Xv \ Xv L j L, τL argminj,τ CUTSCORE(Xv L, j, τ) j R, τR argminj,τ CUTSCORE(Xv R, j, τ) Q.push(v L, Xv L, j L, τL) Q.push(v R, Xv R, j R, τR) return T CUTSCORE(X , j, τ) : if Sj,τ(X) = or Sj,τ(X) = X then return return ψG(Sj,τ(X )) + ψG(Xv \ Sj,τ(X )) LEAFSCORE(v, X , j, τ) : return ψG(X ) CUTSCORE(X , j, τ) |
| Open Source Code | Yes | Our code is available online.4 4https://github.com/talargv/Sp Ex |
| Open Datasets | Yes | We evaluate our methods compared to baselines, on eight public real-world datasets of various sizes and dimensions, detailed in Table 1. |
| Dataset Splits | Yes | Table 1: Datasets. Training set only. |
| Hardware Specification | No | Our experiments are run on a standard regular free-tier cloud CPU machine (Google Colab) and can be reproduced on any similar machine; our methods and experiments do not require special computational resources. No additional compute beyond what is described in the paper was used in the course of preparing this paper. |
| Software Dependencies | No | The implementation of CART in Scikit-Learn [36] contains a weighted variant different from the standard one described in Section 3.3. |
| Experiment Setup | Yes | input: Dataset X Rd, graph G(X, E, w), target number of clusters ℓ output: Decision tree T where every internal node is associated with a coordinate j and threshold τ |