Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

Authors: Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The analysis for AMP optimality relies on the finite assumption as n, d ! 1. We thus also investigate the complementary case when the number of non-zero components is of the order s . n and indeed see that sparse principal component analysis (SPCA) and Diagonal thresholding [9] can perform better than random in this region. We find, however, that this requires s pn (thus = o(1)). We rephrase our findings in terms of existing literature on the subspace clustering for two-classes Gaussian mixtures [10, 11].... We discuss the details on the numerical simulations in App. D and the code is available at https://github.com/lucpoisson/Subspace Clustering.
Researcher Affiliation Academia Bruno Loureiro École Polytechnique Fédérale de Lausanne (EPFL) Information, Learning and Physics lab. Florent Krzakala École Polytechnique Fédérale de Lausanne (EPFL) Information, Learning and Physics lab. Lenka Zdeborová École Polytechnique Fédérale de Lausanne (EPFL) Statistical Physics of Computation lab.
Pseudocode Yes Algorithm 1 low-r AMP
Open Source Code Yes We discuss the details on the numerical simulations in App. D and the code is available at https://github.com/lucpoisson/Subspace Clustering.
Open Datasets No The paper uses a synthetic model (k-cluster Gaussian mixture model) with specified parameters (n, d, s, λ, k) to generate data points. It does not refer to a publicly available dataset or provide access information for one.
Dataset Splits No The paper conducts numerical simulations to verify theoretical predictions and evaluate algorithm performance, but it does not specify data splits (e.g., train/validation/test sets) for model training in the typical machine learning sense.
Hardware Specification No No specific hardware (e.g., CPU, GPU models, memory, cloud instance types) used for running the experiments is mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper.
Experiment Setup Yes For each algorithm considered the error bars are built using the standard deviation over fifty runs with parameters (n = 8000, d = 4000), i.e. = 2. We plot in vertical line the theoretical values for the Information-Theoretic threshold λit (dashed cyan line), and the algorithmic threshold λalg (dotted line in magenta).