Transformers are Minimax Optimal Nonparametric In-Context Learners
Authors: Juno Kim, Tai Nakamaki, Taiji Suzuki
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide numerical experiments validating our results in Appendix E. |
| Researcher Affiliation | Academia | 1University of Tokyo 2Center for Advanced Intelligence Project, RIKEN |
| Pseudocode | No | The paper does not contain pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper states 'NA' for open access to data and code in its checklist, justifying that 'All experiments are toy simulations and data is i.i.d. random'. |
| Open Datasets | No | The paper states 'NA' for open access to data and code in its checklist. It describes generating data from 'nonparametric regression tasks sampled from general function spaces' and 'random combinations of order 2 wavelets' for experiments, but does not provide concrete access to a publicly available dataset. |
| Dataset Splits | No | The paper mentions 'Training and test curves' and 'Training and test losses' but does not explicitly describe validation data splits or their percentages. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' but does not provide specific version numbers for it or any other software libraries or dependencies used. |
| Experiment Setup | Yes | We use the Adam optimizer with a learning rate of 0.02 for all layers. ... after 50 epochs while varying (a) DNN width N; (b) number of in-context samples n; (c) number of tasks T. |