reproducibilityindex.ai

Clustering High Dimensional Categorical Data via Topographical Features

Authors: Chao Chen, Novi Quadrianto

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.
Researcher Affiliation	Academia	Chao Chen CHAO.CHEN@QC.CUNY.EDU CUNY Queens College & Graduate Center, New York, NY, USA; Novi Quadrianto N.QUADRIANTO@SUSSEX.AC.UK SMi Le CLi Ni C, University of Sussex, Brighton, UK
Pseudocode	Yes	Algorithm 1 Discrete-Clustering; Algorithm 2 Compute-Next
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We use synthetic, UCI and biological datasets. See Table 1 for a summary of different datasets. UCI datasets. We use several categorical datasets from the UCI repository (Lichman, 2013)... Biological datasets. We use DNA barcoding datasets from (Kuksa & Pavlovic, 2009).
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits. It only mentions providing the 'true number of clusters to K-Means, K-Modes and mixture models' for competing methods.
Hardware Specification	No	The paper mentions running times but does not specify any hardware details (e.g., CPU, GPU models, or memory specifications) used for the experiments.
Software Dependencies	No	The paper mentions using the 'pyMix package (Georgi et al., 2010)' and other algorithms/methods but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The only parameter we need is the scale parameter δ. Empirically, we observe δ = 1 is the best choice, although δ = 2 and δ = 3 also work well. For methods that depend on initialization, we run ﬁve times and report the mean score. To ensure TMode ﬁnishes in a reasonable amount of time, we restrict the tree degree to no more than eight during model estimation and use this degree-restricted tree for TMode method.