Implicit Optimization Bias of Next-token Prediction in Linear Models
Authors: Christos Thrampoulidis
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we numerically verify these findings and discuss related and future work in Secs. 6 and 7. Additional experiments, further related work and detailed proofs are in the appendix. We validate our analysis with experiments on synthetic data in App. A. |
| Researcher Affiliation | Academia | Christos Thrampoulidis Department of Electrical and Computer Engineering University of British Columbia Vancouver, Canada cthrampo@ece.ubc.ca |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | For completeness, the code will be made publicly available on Github in the final version of the paper. |
| Open Datasets | No | We construct dataset with n = 5000 sequences involving m = 50 distinct contexts. ... The support sets Sj V and the probabilities ˆpj,z,z Sj are chosen randomly; see Fig. 3 for representative examples from the training dataset. |
| Dataset Splits | No | The paper does not explicitly describe train/validation/test dataset splits (e.g., percentages or counts) or cross-validation details for the synthetic data. |
| Hardware Specification | Yes | All experiments were conducted on a Mac Book Pro equipped with a 2.3 GHz Quad-Core Intel Core i7 processor and 32 GB of memory. |
| Software Dependencies | No | The experiments are of relatively small scale and were implemented in Matlab. |
| Experiment Setup | Yes | For GD, we use learning rate η = 0.5 and for NGD and Adam η = 0.01. For Adam, we also set β1 = 0.9,β2 = 0.99. We run all algorithms for 1e4 iterations. |