Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context
Authors: Taejong Joo, Diego Klabjan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To quantify optimality of ICL as a learning algorithm, we compare ICL s sample complexityrelated measures to those of principled learning algorithms by revisiting the performance profiles [20] classic benchmarking framework for optimization software. As a result, we uncover a new insight on optimality of ICL in 3: While ICL with few-shot demonstrations achieves near optimal sample complexity, ICL s sample complexity sharply deteriorates as the number of demonstrations increases in long context. Concretely, many-shot ICL often requires 1.5 times more demonstrations than the Bayes optimal estimator to achieve the same performance. This indicates that, although transformers are theoretically capable of implementing principled algorithms in-context [19], their incontext learning behavior deviates significantly from the optimal learning algorithm in the many-shot regime. We further present evidence that, unlike principled algorithms, ICL may lack fundamental statistical properties (e.g., consistency and asymptotic efficiency) that are critical for algorithms to effectively learn from large demonstration sizes. |
| Researcher Affiliation | Academia | Taejong Joo & Diego Klabjan Department of Industrial Engineering & Management Sciences Northwestern University Evanston, IL, USA EMAIL |
| Pseudocode | No | The paper describes methodologies, objectives, and theoretical analyses using mathematical equations and text, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our source code is available at https://github.com/tjoo512/technical-debt-in-icl. |
| Open Datasets | No | For the data generating distribution of a prompt HT , we follow the approach of sampling target functions f from a hierarchical distribution [21] to capture a more interesting aspect of a learning algorithm model selection. |
| Dataset Splits | No | The paper describes a synthetic data generation process for each prompt/task and defines training and test context lengths (Ttrain and T). However, it does not specify traditional training, validation, and test dataset splits from a pre-existing static dataset, as the data is generated on the fly per task instance. |
| Hardware Specification | Yes | In this work, we use multiple servers which consist of multiple GPUs including RTX 8000 (50GB) and A100 (40GB). |
| Software Dependencies | No | The paper mentions using the GPT-2 architecture and the Adam optimizer but does not specify version numbers for general software dependencies like programming languages, frameworks (e.g., PyTorch, TensorFlow), or other libraries. |
| Experiment Setup | Yes | For the model, we use the GPT-2 [22] architecture for TFθ, which is a standard architecture in the meta ICL and other stylized experimental settings; that is, we define TFθ as a decoder-only transformers [49] with 12 layers, 8 attention heads, and 256-dimensional embedding space. For minimizing the ICL objective l(θ), we compute the stochastic gradient with 64 prompts and update θ by using the Adam optimizer [52] with fixed learning rate of 10 4 for one million training iterations. Also, in order to boost the convergence speed, we use curriculum learning [53] as recommended in [16, 21] by increasing the length of the prompt by 2 every 2,000 training iterations until it reaches (2M + 1) (and the order of Fourier series by 1 until it reaches M). |