Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
Authors: Yuandong Tian
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that around 95% of the solutions obtained by gradient descent match exactly our theoretical constructions. Although the global solutions constructed only required a small number of hidden nodes, our analysis on gradient dynamics shows that overparameterization asymptotically decouples training dynamics and is beneficial. We further show that training dynamics favors simpler solutions under weight decay, and thus high-order global solutions such as perfect memorization are unfavorable. The code is open sourced1. ... In Sec. 7 we show that the gradient descent solutions match exactly with our theoretical construction. ... 7 Experiments Setup. We train the 2-layer MLP on the modular addition task, which is a special case of outcome prediction of Abelian group multiplication. We use Adam optimizer with learning rate 0.01, MSE loss, and train for 10000 epochs with weight decays. We tested on |G| = d {23, 71, 127}. All data are generated synthetically and training/test split is 90%/10%. Each training with a fixed set of hyperparameter configuration is conducted on NVIDIA V100 for a few minutes. Solution Distributions. As shown in Fig. 3, we see order-4 and order-6 solutions in each frequency emerging from well-trained networks on d = 23. The mixed solution z F 4/6 can be clearly observed in a small-scale example (Fig. 6). This is also true for larger d (Fig. 4). |
| Researcher Affiliation | Industry | Yuandong Tian Meta Superintelligence Lab (FAIR) EMAIL |
| Pseudocode | No | The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' section, nor does it present structured steps in a code-like format. Figure 1 provides an overview diagram, not pseudocode. |
| Open Source Code | Yes | The code is open sourced1. 1https://github.com/facebookresearch/luckmatters/tree/yuandong3/ssl/real-dataset |
| Open Datasets | No | All data are generated synthetically and training/test split is 90%/10%. |
| Dataset Splits | Yes | All data are generated synthetically and training/test split is 90%/10%. |
| Hardware Specification | Yes | Each training with a fixed set of hyperparameter configuration is conducted on NVIDIA V100 for a few minutes. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We use Adam optimizer with learning rate 0.01, MSE loss, and train for 10000 epochs with weight decays. We tested on |G| = d {23, 71, 127}. |