reproducibilityindex.ai

MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations

Authors: Zhangyu Wang, Gengchen Mai, Krzysztof Janowicz, Ni Lao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments on two synthetic and seven real-world datasets which cover both temporal and spatial clustering tasks. We compare MC-GTA with a wide range of baselines. The detailed experiment setup, baseline algorithms and evaluation metrics can be found in Appendix A.5.
Researcher Affiliation	Collaboration	Zhangyu Wang 1 Gengchen Mai 2 3 Krzysztof Janowicz 4 1 Ni Lao 5 1Department of Geography, University of California Santa Barbara, CA, USA 2Department of Geography, University of Georgia, GA, USA 3SEAI Lab, Department of Geography and the Environment, University of Texas at Austin, TX, USA 4Faculty of Geosciences, Geography and Astronomy, University of Vienna, Vienna, Austria 5Google, Mountain View, CA, USA.
Pseudocode	Yes	Algorithm 1 MC-GTA Algorithm Input : A dataset D of N observations {Xi = (fi F, pi M)}N i=1. The distance function dm. The metric penalty function r. The model fitting algorithm GL. The density-based clustering algorithm DB. The number of neighbors n used for model fitting. The metric-constraint strength β. The margin hyperparameter δ. Output :A clustering C = {Ck}K k=1 1 for each observation Xi D 2 find n nearest observations Ni in the metric space 3 fit the model parameters θi GL(Ni) (Sec 5.1) 4 for each pair of observations, compute their 5 model dissimilarity dm(i, j) W 2 2 (θi, θj) (Eq 5) 6 metric distance dc(i, j) dc(pi, pj) 7 compute empirical generalized semivariogram ˆγm (Eq 6) 8 fit theoretical generalized semivariogram γm from ˆγm 9 compute range ρ from γm (Sec 4.1) 10 compute loss matrix M w i,j dm(i, j) + βrρ(i, j) (Eq 10) 11 run density-based clustering algorithm C DB(M w) 12 return C
Open Source Code	Yes	The implementation of our algorithm and a tutorial is publicized on Git Hub via https: //github.com/Octopolugal/MC-GTA.git.
Open Datasets	Yes	The detailed information can be found at https://timeseriesclassification.com/description.php? Dataset=Asphalt Pavement Type. (Pavement Dataset)
Dataset Splits	No	No explicit training/validation/test dataset splits with percentages or absolute sample counts are provided. The paper mentions 'We conduct hyperparameter tuning on the number of neighbors n, the weight β, and the margin δ' which implies a validation process but doesn't specify how data was split for this.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running experiments are provided in the paper.
Software Dependencies	No	The paper mentions 'sklearn(Pedregosa et al., 2011)' and 'scikit-learn: Machine learning in Python' (in references), indicating Python and scikit-learn are used. It also mentions 'Mclust version 3: An r package' (in references). However, no specific version numbers for these software dependencies are explicitly stated in the paper.
Experiment Setup	Yes	We conduct hyperparameter tuning on the number of neighbors n, the weight β, and the margin δ. The only hyperparameters we preset are cluster number K, feature dimension D, noise scale α and sample batch size k.