Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Angular Constraint Embedding via SpherePair Loss for Constrained Clustering

Authors: Shaojie Zhang, Ke Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comparative evaluations with stateof-the-art DCC methods on diverse benchmarks, along with empirical validation of theoretical insights, confirm its superior performance, scalability, and overall real-world effectiveness. Code is available at our repository. 1 Introduction ... 5 Experiments
Researcher Affiliation	Academia	Shaojie Zhang Ke Chen Department of Computer Science, The University of Manchester, Manchester M13 9PL, U.K. EMAIL
Pseudocode	Yes	The Sphere Pair CC algorithm is outlined in Algorithm 1 in Appendix B. ... The cluster number inference algorithm is presented as Algorithm 2 in Appendix B.
Open Source Code	Yes	Code is available at our repository. ... Our source code is available on Git Hub: https://github.com/spherepaircc/Sphere Pair CC/tree/main
Open Datasets	Yes	We adopt eight benchmarks with diverse class counts and class balance: CIFAR-100-20 and CIFAR-10 [45], Fashion MNIST [46], Image Net-10 [47], MNIST [48], STL-10 [49], together with two imbalanced text datasets, Reuters subset [50] and RCV1-10 (see Appendix E.1 for details).
Dataset Splits	Yes	For Fashion MNIST, MNIST, and the Reuters subset, we use the original pre-split training and test data settings. For the remaining benchmarks, we randomly split the data into 80% training and 20% test sets. Consistent with [14], we reserve a validation set of 1,000 instances from the training data to optimise the hyperparameters for baselines requiring such tuning. ... we randomly split each dataset into 80% for training and 20% for testing, resulting in 48,000/12,000 samples for training/testing in CIFAR-10 and CIFAR-100-20, 10,400/2,600 in STL-10 and Image Net10, and 142,135/35,534 in RCV1-10.
Hardware Specification	Yes	Experiments are conducted on Tesla V100 GPU with 16 GB of memory.
Software Dependencies	Yes	We implement all methods (except DCGMM12) in Py Torch 1.5.1 13 with Python 3.7. For Sphere Pair and Auto Embedder, we use scikit-learn s K-means implementation14 and fastcluster s efficient hierarchical clustering implementation15 for clustering.
Experiment Setup	Yes	For Vanilla DCC and Vol Max DCC, we use a fully connected network with two hidden layers of size 512 512 and a classification layer matching the number of clusters, K, as recommended in [14]. Re LU activations are used across all networks. Pretrained autoencoders are employed for model initialisation, except for Vanilla DCC and Vol Max DCC. ... In Sphere Pair and cluster number inference, ω is theoretically fixed at 2 as per Sect. 4, while λ = 0.02 and ρ = 0.05 are used by default unless varied for hyperparameter robustness evaluation. For baselines, we adopt reported optimal hyperparameters (Vanilla DCC, DCGMM, CIDEC, SDEC) or follow the search procedures in Vol Max DCC and Auto Embedder. Training is conducted using the Adam optimizer, except for SDEC and Vol Max DCC, which employs SGD as suggested by their authors.