Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
Authors: Junsoo Oh, Jerry Song, Chulhee Yun
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to support our findings, using NVIDIA RTX A6000 GPUs. We perform experiments in our setting described in Section 2. We also provide empirical results using a real-world dataset MNIST. |
| Researcher Affiliation | Academia | Junsoo Oh Jerry Song Chulhee Yun KAIST EMAIL |
| Pseudocode | No | The paper describes update rules for model parameters (Equations 1 and 2) but does not present them within a structured pseudocode or algorithm block. |
| Open Source Code | No | We do not provide code in supplemental material. However, our results in synthetic data and MNIST data can be easily reproduced since we opened all details. |
| Open Datasets | Yes | We also provide empirical results using a real-world dataset MNIST. |
| Dataset Splits | Yes | We first train the weak model using nwk = 5000 true-labeled data points. ... We use three different values for the number of data points, nst = 75, 2000, 20000. ... Then, we train the strong model using labels predicted by the trained weak model, with varying numbers of training samples nst = 500, 1000, 1500, 2000, 2500. |
| Hardware Specification | Yes | We conduct experiments to support our findings, using NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions using "stochastic gradient descent" and "full-batch Adam optimizer with default parameters" but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | The training is conducted for 1000 epochs using stochastic gradient descent with batch size 256 and learning rate η = 0.1... We use the strong model with m = 50 filters and an initialization scale σ0 = 0.01. We train the strong model using stochastic gradient descent with batch size 256 and learning rate η = 0.1... We train the strong model for 2000 training epochs when nst = 75 or nst = 2000, and for 10000 epochs when nst = 20000... We train each model for 300 epochs using the full-batch Adam optimizer with default parameters. |