Implicit Optimization Bias of Next-token Prediction in Linear Models

Authors: Christos Thrampoulidis

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we numerically verify these findings and discuss related and future work in Secs. 6 and 7. Additional experiments, further related work and detailed proofs are in the appendix. We validate our analysis with experiments on synthetic data in App. A.
Researcher Affiliation Academia Christos Thrampoulidis Department of Electrical and Computer Engineering University of British Columbia Vancouver, Canada cthrampo@ece.ubc.ca
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No For completeness, the code will be made publicly available on Github in the final version of the paper.
Open Datasets No We construct dataset with n = 5000 sequences involving m = 50 distinct contexts. ... The support sets Sj V and the probabilities ˆpj,z,z Sj are chosen randomly; see Fig. 3 for representative examples from the training dataset.
Dataset Splits No The paper does not explicitly describe train/validation/test dataset splits (e.g., percentages or counts) or cross-validation details for the synthetic data.
Hardware Specification Yes All experiments were conducted on a Mac Book Pro equipped with a 2.3 GHz Quad-Core Intel Core i7 processor and 32 GB of memory.
Software Dependencies No The experiments are of relatively small scale and were implemented in Matlab.
Experiment Setup Yes For GD, we use learning rate η = 0.5 and for NGD and Adam η = 0.01. For Adam, we also set β1 = 0.9,β2 = 0.99. We run all algorithms for 1e4 iterations.