Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap
Authors: Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The analysis for AMP optimality relies on the finite assumption as n, d ! 1. We thus also investigate the complementary case when the number of non-zero components is of the order s . n and indeed see that sparse principal component analysis (SPCA) and Diagonal thresholding [9] can perform better than random in this region. We find, however, that this requires s pn (thus = o(1)). We rephrase our findings in terms of existing literature on the subspace clustering for two-classes Gaussian mixtures [10, 11].... We discuss the details on the numerical simulations in App. D and the code is available at https://github.com/lucpoisson/Subspace Clustering. |
| Researcher Affiliation | Academia | Bruno Loureiro École Polytechnique Fédérale de Lausanne (EPFL) Information, Learning and Physics lab. Florent Krzakala École Polytechnique Fédérale de Lausanne (EPFL) Information, Learning and Physics lab. Lenka Zdeborová École Polytechnique Fédérale de Lausanne (EPFL) Statistical Physics of Computation lab. |
| Pseudocode | Yes | Algorithm 1 low-r AMP |
| Open Source Code | Yes | We discuss the details on the numerical simulations in App. D and the code is available at https://github.com/lucpoisson/Subspace Clustering. |
| Open Datasets | No | The paper uses a synthetic model (k-cluster Gaussian mixture model) with specified parameters (n, d, s, λ, k) to generate data points. It does not refer to a publicly available dataset or provide access information for one. |
| Dataset Splits | No | The paper conducts numerical simulations to verify theoretical predictions and evaluate algorithm performance, but it does not specify data splits (e.g., train/validation/test sets) for model training in the typical machine learning sense. |
| Hardware Specification | No | No specific hardware (e.g., CPU, GPU models, memory, cloud instance types) used for running the experiments is mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper. |
| Experiment Setup | Yes | For each algorithm considered the error bars are built using the standard deviation over fifty runs with parameters (n = 8000, d = 4000), i.e. = 2. We plot in vertical line the theoretical values for the Information-Theoretic threshold λit (dashed cyan line), and the algorithmic threshold λalg (dotted line in magenta). |