An Information-Theoretic Analysis of In-Context Learning
Authors: Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We introduce new information-theoretic tools that lead to a concise yet general decomposition of error for a Bayes optimal predictor into two components: meta-learning error and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers and corroborate existing results a simple linear setting. Our theoretical results characterize how error decays in both the number of training sequences and sequence lengths. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Stanford University, Stanford, CA, USA 2Princeton University, Princeton, NJ, USA 3New York University, New York City, NY, USA 4Stanford University, Stanford, CA, USA. |
| Pseudocode | No | The paper contains mathematical derivations, theorems, and proofs but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about releasing code or links to repositories for the methodology described. |
| Open Datasets | No | The paper is theoretical and analyzes abstract data generating processes. It does not mention or use any specific public datasets or provide access information for a dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments. Therefore, it does not provide any dataset split information for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup. It does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters, training configurations, or system-level settings. |