reproducibilityindex.ai

On hyperparameter tuning in general clustering problemsm

Authors: Xinjie Fan, Yuguang Yue, Purnamrita Sarkar, Y. X. Rachel Wang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In a variety of simulation and real data experiments, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings. (...) Finally, Section 5 contains detailed simulated and real data experiments
Researcher Affiliation	Academia	1Department of Statistics and Data Sciences, University of Texas at Austin 2School of Mathematics and Statistics, University of Sydney.
Pseudocode	Yes	Algorithm 1 MAx-TRace (MATR) for known r. (...) Algorithm 2 MATR-CV. (...) Algorithm 3 Splitting (...) Algorithm 4 Cluster Test
Open Source Code	No	The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We compare MATR with ECV and CL on the football (Girvan & Newman, 2002), political books and the political blogs (Adamic & Glance, 2005) datasets. (...) test set provided by (Pedregosa et al., 2011) of the Optical Recognition of Handwritten Digits Data Set (...) Avila dataset 1https://archive.ics.uci.edu/ml/datasets/Avila
Dataset Splits	Yes	Algorithm 2 MATR-CV. (...) training ratio γtrain, trace gap for j = 1 : J do (...) We use a training ratio of 0.9 and the L2 loss throughout.
Hardware Specification	Yes	MATR-CV takes around 2 hours to complete while SIL takes around 7 hours and GAP takes around 30 hours to ﬁnish on an single node of two Xeon E5-2690 v3 with 24 cores.
Software Dependencies	No	The paper does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	Since λ [0, 1] for SDP-1, we choose λ {0, , 20}/20 in all the examples. (...) our candidate set of θ is {tα/20} for t = 1, , 20 and α = maxi,j Yi Yj 2. (...) We vary c from 0 to 200. (...) For all methods, we set the maximal number of clusters to be square root of the dataset size.