An Information-Theoretic Analysis of In-Context Learning

Authors: Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We introduce new information-theoretic tools that lead to a concise yet general decomposition of error for a Bayes optimal predictor into two components: meta-learning error and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers and corroborate existing results a simple linear setting. Our theoretical results characterize how error decays in both the number of training sequences and sequence lengths.
Researcher Affiliation Academia 1Department of Computer Science, Stanford University, Stanford, CA, USA 2Princeton University, Princeton, NJ, USA 3New York University, New York City, NY, USA 4Stanford University, Stanford, CA, USA.
Pseudocode No The paper contains mathematical derivations, theorems, and proofs but no explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing code or links to repositories for the methodology described.
Open Datasets No The paper is theoretical and analyzes abstract data generating processes. It does not mention or use any specific public datasets or provide access information for a dataset.
Dataset Splits No The paper is theoretical and does not describe empirical experiments. Therefore, it does not provide any dataset split information for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe any experimental setup. It does not list specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters, training configurations, or system-level settings.