Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Implicit Optimization Bias of Next-token Prediction in Linear Models
Authors: Christos Thrampoulidis
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we numerically verify these findings and discuss related and future work in Secs. 6 and 7. Additional experiments, further related work and detailed proofs are in the appendix. We validate our analysis with experiments on synthetic data in App. A. |
| Researcher Affiliation | Academia | Christos Thrampoulidis Department of Electrical and Computer Engineering University of British Columbia Vancouver, Canada EMAIL |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | For completeness, the code will be made publicly available on Github in the final version of the paper. |
| Open Datasets | No | We construct dataset with n = 5000 sequences involving m = 50 distinct contexts. ... The support sets Sj V and the probabilities Ėpj,z,z Sj are chosen randomly; see Fig. 3 for representative examples from the training dataset. |
| Dataset Splits | No | The paper does not explicitly describe train/validation/test dataset splits (e.g., percentages or counts) or cross-validation details for the synthetic data. |
| Hardware Specification | Yes | All experiments were conducted on a Mac Book Pro equipped with a 2.3 GHz Quad-Core Intel Core i7 processor and 32 GB of memory. |
| Software Dependencies | No | The experiments are of relatively small scale and were implemented in Matlab. |
| Experiment Setup | Yes | For GD, we use learning rate η = 0.5 and for NGD and Adam η = 0.01. For Adam, we also set β1 = 0.9,β2 = 0.99. We run all algorithms for 1e4 iterations. |