Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Balanced Clustering via Exclusive Lasso: A Pragmatic Approach

Authors: Zhihui Li, Feiping Nie, Xiaojun Chang, Zhigang Ma, Yi Yang

AAAI 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms. In this section, extensive experiments are conducted to evaluate the proposed clustering methods.
Researcher Affiliation	Collaboration	Zhihui Li,1 Feiping Nie,2 Xiaojun Chang,3 Zhigang Ma,3 Yi Yang4 1Beijing Etrol Technologies Co., Ltd. 2Centre for OPTical Imagery Analysis and Learning, Northwestern Polytechnical University. 3School of Computer Science, Carnegie Mellon University. 4Centre for Artiﬁcial Intelligence, University of Technology Sydney.
Pseudocode	Yes	Algorithm 1 Algorithm to solve the objective function of balanced k-means. Algorithm 2 Algorithm to solve the objective function of balanced min-cut.
Open Source Code	No	The paper does not provide any links to open-source code or state that code will be made available.
Open Datasets	Yes	MNIST Handwritten Digit Dataset: The MNIST handwritten digit dataset (Le Cun et al. 2011). Yale B face dataset: The Yale B dataset (Georghiades, Belhumeur, and Kriegman 2001). ORL face dataset: The ORL dataset (Samaria and Harter 1994). JAFFE Japanese Female Facial Expression dataset: The JAFFE dataset (Lyons, Budynek, and Akamatsu 1999). Human EVA Motion dataset 1. (1http://vision.cs.brown.edu/humaneva/). Coil20 Object dataset: We use the Coil20 dataset (Nene, Nayar, and Murase 1996).
Dataset Splits	No	The paper mentions that for MNIST, 'The dataset contains 60,000 training images and 10,000 testing images. We merge all the training and testing images in the experiments.' This indicates no separate validation split, and the existing test split was merged, so specific train/validation/test splits for reproducibility are not provided.
Hardware Specification	No	The paper does not specify any hardware details such as CPU, GPU models, or memory used for experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	For the regularization parameter γ in Eq. (13) and Eq. (24), we tune them by a grid-search strategy from {10 6, 10 4, 10 2, 100, 102, 104, 106}. We similarly tune the regularization parameters of all the comparison algorithms from the aforementioned range.