Clustering High Dimensional Categorical Data via Topographical Features
Authors: Chao Chen, Novi Quadrianto
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property. |
| Researcher Affiliation | Academia | Chao Chen CHAO.CHEN@QC.CUNY.EDU CUNY Queens College & Graduate Center, New York, NY, USA; Novi Quadrianto N.QUADRIANTO@SUSSEX.AC.UK SMi Le CLi Ni C, University of Sussex, Brighton, UK |
| Pseudocode | Yes | Algorithm 1 Discrete-Clustering; Algorithm 2 Compute-Next |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use synthetic, UCI and biological datasets. See Table 1 for a summary of different datasets. UCI datasets. We use several categorical datasets from the UCI repository (Lichman, 2013)... Biological datasets. We use DNA barcoding datasets from (Kuksa & Pavlovic, 2009). |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits. It only mentions providing the 'true number of clusters to K-Means, K-Modes and mixture models' for competing methods. |
| Hardware Specification | No | The paper mentions running times but does not specify any hardware details (e.g., CPU, GPU models, or memory specifications) used for the experiments. |
| Software Dependencies | No | The paper mentions using the 'pyMix package (Georgi et al., 2010)' and other algorithms/methods but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The only parameter we need is the scale parameter δ. Empirically, we observe δ = 1 is the best choice, although δ = 2 and δ = 3 also work well. For methods that depend on initialization, we run five times and report the mean score. To ensure TMode finishes in a reasonable amount of time, we restrict the tree degree to no more than eight during model estimation and use this degree-restricted tree for TMode method. |