Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Curl Descent : Non-Gradient Learning Dynamics with Sign-Diverse Plasticity

Authors: Hugo Ninou, Jonathan Kadmon, N Alex Cayco Gajic

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop a theoretical framework to isolate and systematically analyze the effect of curl terms in large linear feedforward networks. Leveraging random matrix theory, we identify a dynamical phase transition in which the zero-error solution manifold loses stability. We demonstrate that the location of this phase transition depends on architectural parameters, particularly the expansion ratio between the input and hidden layers. Finally, we provide numerical evidence that, in certain nonlinear architectures, curl descent can accelerate learning, even in the absence of true gradient flow.
Researcher Affiliation Academia Hugo Ninou Département d Études Cognitives École normale supérieure PSL EMAIL Jonathan Kadmon Edmond and Lily Safra Center for Brain Sciences The Hebrew University of Jerusalem EMAIL N. Alex Cayco-Gajic Département d Études Cognitives École normale supérieure PSL EMAIL
Pseudocode No The paper describes methods and equations but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code accompanying our paper is available at https://github.com/caycogajiclab/Curl_Descent.
Open Datasets No Inputs were sampled as xi i.i.d. N(0, 1/ sqrt(2)), and along with the teacher s outputs, provided the training data for the students.
Dataset Splits No Weight updates were made on the whole training set (Ntrain = 250 samples) with a learning rate 0.1/Ntrain over Nepochs = 10^5 epochs.
Hardware Specification Yes Compute resources: 4 hours on 500 CPUs (local cluster).
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers (e.g., library names, framework versions).
Experiment Setup Yes Weight updates were made on the whole training set (Ntrain = 250 samples) with a learning rate 0.1/Ntrain over Nepochs = 10^5 epochs. To ensure numerical stability, W1 and W2 were re-normalized at every epoch to match their initial Frobenius norm. When analyzing the stability regimes, we focused on linear networks to be able to compare directly to theory; however, we note that the qualitative properties also extend to nonlinear networks (see Supplementary Material C). Furthermore, in our final results on convergence speed in curl descent we implemented nonlinear networks with tanh activation functions. ... The hidden and read-out weights of the teacher were sampled i.i.d. from zero-mean distributions, with variance scaled by the number of input neurons to each layer, ensuring that stability depends only on the compression ratio c = M/N and not on the statistics of the weights. The student networks had identical architectures to the teacher networks, with weights initialized from the same distribution (unless otherwise specified). Inputs were sampled as xi i.i.d. N(0, 1/ sqrt(2)), and along with the teacher s outputs, provided the training data for the students.