reproducibilityindex.ai

In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness

Authors: Liam Collins, Advait Parulekar, Aryan Mokhtari, Sujay Sanghavi, Sanjay Shakkottai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify all of these results with empirical simulations (Section 3.2 and Appendix J).
Researcher Affiliation	Academia	Liam Collins Chandra Family Department of ECE The University of Texas at Austin liamc@utexas.edu
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Please see supplementary material.
Open Datasets	No	The paper generates data based on specified distributions and does not use a pre-existing publicly available dataset. 'f D(F), x1, . . . , xn+1 i.i.d. D (n+1) x , ϵ1, . . . , ϵn i.i.d. D (n+1) ϵ'
Dataset Splits	No	The paper describes a pretraining protocol and evaluation on new tasks, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification	No	All experiments were run in Google Colab in a CPU runtime.
Software Dependencies	No	All training was executed in Py Torch with the Adam optimizer.
Experiment Setup	Yes	In all cases we use the Adam optimizer with one task sampled per round, use the noise distribution Dϵ = N(0, σ2), and run 10 trials and plot means and standard deviations over these 10 trials. We use an exponentially decaying learning rate schedule with factor 0.999. In Figures 3 and 5 we use initial learning rate 0.1 and in Figure 4 we use an initial learning rate 0.01.