Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On hyperparameter tuning in general clustering problemsm
Authors: Xinjie Fan, Yuguang Yue, Purnamrita Sarkar, Y. X. Rachel Wang
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a variety of simulation and real data experiments, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings. (...) Finally, Section 5 contains detailed simulated and real data experiments |
| Researcher Affiliation | Academia | 1Department of Statistics and Data Sciences, University of Texas at Austin 2School of Mathematics and Statistics, University of Sydney. |
| Pseudocode | Yes | Algorithm 1 MAx-TRace (MATR) for known r. (...) Algorithm 2 MATR-CV. (...) Algorithm 3 Splitting (...) Algorithm 4 Cluster Test |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We compare MATR with ECV and CL on the football (Girvan & Newman, 2002), political books and the political blogs (Adamic & Glance, 2005) datasets. (...) test set provided by (Pedregosa et al., 2011) of the Optical Recognition of Handwritten Digits Data Set (...) Avila dataset 1https://archive.ics.uci.edu/ml/datasets/Avila |
| Dataset Splits | Yes | Algorithm 2 MATR-CV. (...) training ratio γtrain, trace gap for j = 1 : J do (...) We use a training ratio of 0.9 and the L2 loss throughout. |
| Hardware Specification | Yes | MATR-CV takes around 2 hours to complete while SIL takes around 7 hours and GAP takes around 30 hours to finish on an single node of two Xeon E5-2690 v3 with 24 cores. |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | Since λ [0, 1] for SDP-1, we choose λ {0, , 20}/20 in all the examples. (...) our candidate set of θ is {tα/20} for t = 1, , 20 and α = maxi,j Yi Yj 2. (...) We vary c from 0 to 200. (...) For all methods, we set the maximal number of clusters to be square root of the dataset size. |