reproducibilityindex.ai

Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

Authors: Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The analysis for AMP optimality relies on the ﬁnite assumption as n, d ! 1. We thus also investigate the complementary case when the number of non-zero components is of the order s . n and indeed see that sparse principal component analysis (SPCA) and Diagonal thresholding [9] can perform better than random in this region. We ﬁnd, however, that this requires s pn (thus = o(1)). We rephrase our ﬁndings in terms of existing literature on the subspace clustering for two-classes Gaussian mixtures [10, 11].... We discuss the details on the numerical simulations in App. D and the code is available at https://github.com/lucpoisson/Subspace Clustering.
Researcher Affiliation	Academia	Bruno Loureiro École Polytechnique Fédérale de Lausanne (EPFL) Information, Learning and Physics lab. Florent Krzakala École Polytechnique Fédérale de Lausanne (EPFL) Information, Learning and Physics lab. Lenka Zdeborová École Polytechnique Fédérale de Lausanne (EPFL) Statistical Physics of Computation lab.
Pseudocode	Yes	Algorithm 1 low-r AMP
Open Source Code	Yes	We discuss the details on the numerical simulations in App. D and the code is available at https://github.com/lucpoisson/Subspace Clustering.
Open Datasets	No	The paper uses a synthetic model (k-cluster Gaussian mixture model) with specified parameters (n, d, s, λ, k) to generate data points. It does not refer to a publicly available dataset or provide access information for one.
Dataset Splits	No	The paper conducts numerical simulations to verify theoretical predictions and evaluate algorithm performance, but it does not specify data splits (e.g., train/validation/test sets) for model training in the typical machine learning sense.
Hardware Specification	No	No specific hardware (e.g., CPU, GPU models, memory, cloud instance types) used for running the experiments is mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper.
Experiment Setup	Yes	For each algorithm considered the error bars are built using the standard deviation over ﬁfty runs with parameters (n = 8000, d = 4000), i.e. = 2. We plot in vertical line the theoretical values for the Information-Theoretic threshold λit (dashed cyan line), and the algorithmic threshold λalg (dotted line in magenta).