reproducibilityindex.ai

Dependency Clustering of Mixed Data with Gaussian Mixture Copulas

Authors: Vaibhav Rajan, Sakyajit Bhattacharya

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate performance improvements over state-of-the-art methods of correlation clustering on synthetic and benchmark datasets. Our experimental results demonstrate the efﬁcacy of our method, that outperforms state-of-the-art methods for correlation clustering on synthetic and real benchmark data sets with mixed features, thus illustrating the advantage of our copula-based approach for dependency clustering.
Researcher Affiliation	Industry	Vaibhav Rajan, Sakyajit Bhattacharya Xerox Research Centre India {vaibhav.rajan, sakyajit.bhattacharya}@xerox.com
Pseudocode	Yes	Algorithm 1 EGMCM Input: R(Y), scaled rank transformed data Y; G, number of clusters Initialization Z = Φ 1(R(Y)) loop Estimate, via EM, GMM parameters # = [ g, µg, g] Resample Z: for j: 1 to p do for all y 2 unique {y1j, . . . , ynj} do Compute zlj = max{zij : yij < y} and zuj = min{zij : y < yij} For each i such that yij = y: Sample rgij from TN(µgj, σgij, zlj, zuj) Set zij = PG g=1 grgij end for end for end loop Output: Cluster labels (latent variables of GMM (Z\|#))
Open Source Code	No	The paper does not provide any explicit statement or link regarding the availability of open-source code for the described methodology.
Open Datasets	Yes	We compare the performance of our algorithm on 10 benchmark datasets obtained from the UCI repository [Bache and Lichman, 2013]. Table 2: Details of datasets from UCI repository used in our experiments. n: number of observations, pnum: number of numerical features, pcat: number of discrete features, G: number of clusters. Asterisk: dataset contains missing values. [Bache and Lichman, 2013] K. Bache and M. Lichman. UCI machine learning repository, 2013.
Dataset Splits	No	The paper describes the datasets used (synthetic and UCI benchmark datasets) but does not provide specific details on train, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split references) required for reproduction.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	No	The paper describes simulation settings and variations in parameters like 'G' (number of clusters) and 'n' (number of observations), but it does not provide specific experimental setup details such as hyperparameter values, optimizer settings, or other system-level training configurations for its algorithms.