reproducibilityindex.ai

Achieving Optimal Clustering in Gaussian Mixture Models with Anisotropic Covariance Structures

Authors: Xin Chen, Anderson Ye Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Numerical Studies In this section, we compare the performance of our methods with other popular clustering methods on synthetic and real datasets under different settings.
Researcher Affiliation	Academia	Xin Chen Princeton University xc5557@princeton.edu Anderson Ye Zhang University of Pennsylvania ayz@wharton.upenn.edu
Pseudocode	Yes	Algorithm 1: Adjusted Lloyd s Algorithm for Model 1. and Algorithm 2: Adjusted Lloyd s Algorithm for Model 2.
Open Source Code	No	The paper does not contain any explicit statement about making its source code available or a direct link to a code repository.
Open Datasets	Yes	To further demonstrate the effectiveness of our methods, we conduct experiments using the Fashion-MNIST dataset [23].
Dataset Splits	No	The paper conducts numerical studies on synthetic and real datasets (Fashion-MNIST) but does not specify the explicit training, validation, and test dataset splits used for these experiments.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries).
Experiment Setup	Yes	In this section, we compare the performance of our methods with other popular clustering methods on synthetic and real datasets under different settings. ... We independently generate n = 1200 samples with dimension d = 50 from k = 30 clusters. Each cluster has 40 samples. We set Σ = U T ΛU, where Λ is a 50 50 diagonal matrix with diagonal elements selected from 0.5 to 8 with equal space and U is a randomly generated orthogonal matrix. The centers {θ a}a [n] are orthogonal to each other with θ 1 = . . . = θ 30 = 9. and In this case, we take n = 1200, k = 2, and d = 9. We set Σ 1 = Id and Σ 2 = Λ2, a diagonal matrix where the first diagonal entry is 0.5 and the remaining entries are 5. We set the cluster sizes to be 900 and 300, respectively. To simplify the calculation of SNR , we set θ 1 = 0 and θ 2 = 5e1... and Additionally, the dashed lines in the left and right panels represent the optimal exponents SNR2/8 and SNR 2/8 of the minimax bounds, respectively. It is observed that both Algorithm 1 and Algorithm 2 meet these benchmarks after three iterations. and we apply PCA to reduce dimensionality from 784 to 50 by retaining the top 50 principal components.