Transformers are Minimax Optimal Nonparametric In-Context Learners

Authors: Juno Kim, Tai Nakamaki, Taiji Suzuki

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide numerical experiments validating our results in Appendix E.
Researcher Affiliation Academia 1University of Tokyo 2Center for Advanced Intelligence Project, RIKEN
Pseudocode No The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper states 'NA' for open access to data and code in its checklist, justifying that 'All experiments are toy simulations and data is i.i.d. random'.
Open Datasets No The paper states 'NA' for open access to data and code in its checklist. It describes generating data from 'nonparametric regression tasks sampled from general function spaces' and 'random combinations of order 2 wavelets' for experiments, but does not provide concrete access to a publicly available dataset.
Dataset Splits No The paper mentions 'Training and test curves' and 'Training and test losses' but does not explicitly describe validation data splits or their percentages.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'Adam optimizer' but does not provide specific version numbers for it or any other software libraries or dependencies used.
Experiment Setup Yes We use the Adam optimizer with a learning rate of 0.02 for all layers. ... after 50 epochs while varying (a) DNN width N; (b) number of in-context samples n; (c) number of tasks T.