Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
Authors: Luca Arnaboldi, Bruno Loureiro, Ludovic Stephan, Florent Krzakala, Lenka Zdeborová
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All our formal claim are supported by rigorous proofs, as well as numerical experiments; the code developed is available at https://github.com/Ide PHICS/Sequence-Single-Index. |
| Researcher Affiliation | Academia | Luca Arnaboldi Ide Phics Laboratory EPFL Lausanne, Switzerland Bruno Loureiro Département d Informatique École Normale Supérieure PSL Paris, France Ludovic Stephan ENS AI University Rennes Rennes, France Florent Krzakala Ide Phics Laboratory EPFL Lausanne, Switzerland Lenka Zdeborová SPOC Laboratory EPFL Lausanne, Switzerland |
| Pseudocode | No | The paper describes theoretical models, derivations, and mathematical equations. It does not present any pseudocode or algorithm blocks. |
| Open Source Code | Yes | All our formal claim are supported by rigorous proofs, as well as numerical experiments; the code developed is available at https://github.com/Ide PHICS/Sequence-Single-Index. |
| Open Datasets | No | To derive a sharp characterization of the sample complexity and convergence rate of SGD for the single-layer attention mechanism in eq. (1), we assume that the sequence data (X, y) is generated from the following Gaussian sequence single-index (SSI) model |
| Dataset Splits | No | We assume training data (X, y) RL d Rk is independently drawn from a Gaussian Sequence Single Index (SSI) model |
| Hardware Specification | Yes | The experiments run on a Mac Studio M2 Ultra, within at most few hours for the largest ones. |
| Software Dependencies | No | The code is written in Python, using the libraries numpy, scipy, torch and matplotlib. hydra is used to manage the configuration files. |
| Experiment Setup | Yes | The simulations in Figure 5 are performed with d = 1000... In Figure 12 we reproduce Figure 5 for d = 100... The probability is computed over 64 SGD runs... Averaged over 25 runs... The gain is proportional to L, as predicted. g(z ) = PL i=1 He2(z ,i), d = 1000, σ = Re LU. (Section E.1) γtied = γuntied = costant with L = γ0. ... γtied = γuntied = γ0 = 0.005. ... In our experiments, we used Nint = 17, while for the phase diagram in Figure 4 we used Nint = 19. |