Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Spectral Transitions in High-Dimensional Multi-Index Models

Authors: Leonardo Defilippis, Yatin Dandi, Pierre Mergny, Florent Krzakala, Bruno Loureiro

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Supported by numerical experiments and a rigorous theoretical framework, our work bridges critical gaps in the computational limits of weak learnability in multi-index model. In this section we illustrate the framework introduced in Section 2 to predict the asymptotic performance of the spectral estimators (10,12) for specific examples of link functions, providing a comparison between our asymptotic analytical results and finite size numerical simulations for the overlap between the spectral estimators and the weights W.
Researcher Affiliation	Academia	1Département d Informatique, École Normale Supérieure, PSL & CNRS, Paris, France 2Information, Learning and Physics Laboratory. EPFL, CH-1015 Lausanne, Switzerland. 3Sloan School of Management, MIT, United States.
Pseudocode	Yes	A Generalized Approximate Message Passing algorithms In this section we present a general version of the multi-dimensional Generalized Approximate Message Passing (GAMP) algorithm [32], defined as the iterations Ωt = Xf t in(Bt) gt 1 out (Ωt 1, y)V T t , (40) Bt+1 = XT gt out(Ωt, y) f t in(Bt)AT t (41)
Open Source Code	No	We judge the code too simple to be released, and we provide enough information for the reproducibility of the numerical plots. All data sets used in the experiments are synthetic.
Open Datasets	No	All data sets used in the experiments are synthetic.
Dataset Splits	No	The paper describes the generation of synthetic data for simulations (e.g., 'n i.i.d. samples (xi, yi)') and the scale of simulations (e.g., 'n = 5000', 'd = 5000'). However, it does not specify explicit training/test/validation splits in the context of machine learning experiments, as the numerical experiments are primarily illustrative of theoretical results.
Hardware Specification	No	All experiments are simple enough to be run on a standard laptop in few hours.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup	Yes	In Figure 4 we compare these theoretical predictions to numerical simulations at finite dimensions, respectively for the link functions g(z1, z2) = z1z2 and g(z) = p 1 z 2. Additional numerical experiments are presented in Appendix E. The dots represent numerical simulation results, computed for n = 5000 (for the asymmetric method) or d = 5000 (for the symmetric method) and averaging over 10 instances.