Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Authors: Toni Liu, Nicolas Boulle, Raphaël Sarfati, Christopher Earls
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate LLMs ability to perform density estimation (DE), which involves estimating the probability density function (PDF) from data observed in-context. Our core experiment is remarkably straightforward. As illustrated in Figure 1, we prompt LLMs such as LLa MA-2 (Touvron et al., 2023), Gemma (Gemma Team et al., 2024), and Mistral (Jiang et al., 2023) with a series of data points {Xi}n i=1 sampled independently and identically from an underlying distribution p(x). We then observe that the LLMs predicted PDF, ˆpn(x), for the next data point gradually converges to the ground truth as the context length n (the number of in-context data points) increases.1 |
| Researcher Affiliation | Academia | Toni J.B. Liu Department of Physics Cornell University, USA EMAIL Raphaël Sarfati School of Civil and Environmental Engineering Cornell University, USA EMAIL Nicolas Boullé Department of Mathematics Imperial College London, UK EMAIL Christopher J. Earls Center for Applied Mathematics School of Civil and Environmental Engineering Cornell University, USA EMAIL |
| Pseudocode | No | The paper describes methods and steps in prose (e.g., "Our methodology consists of 5 steps..."), but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codebase, along with a 3D visualization of an LLM s in-context learning trajectory, is publicly available at https://github.com/Antonio Liu97/LLMICL_in PCA. |
| Open Datasets | No | We investigate LLMs ability to perform density estimation (DE), which involves estimating the probability density function (PDF) from data observed in-context. Our core experiment is remarkably straightforward. As illustrated in Figure 1, we prompt LLMs such as LLa MA-2 (Touvron et al., 2023), Gemma (Gemma Team et al., 2024), and Mistral (Jiang et al., 2023) with a series of data points {Xi}n i=1 sampled independently and identically from an underlying distribution p(x). The paper details how target distributions are created (Gaussian, Uniform, and randomly generated via Gaussian Processes in Appendix A.9), but does not provide specific links, DOIs, or repositories for the sampled datasets used in the experiments. |
| Dataset Splits | No | The paper investigates in-context learning by varying the number of data points (context length 'n') provided to the LLM for density estimation, rather than using predefined training, validation, and test splits from a static dataset. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, or memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'SciPy' for numerical optimization but does not provide its version number or any other specific software dependencies with their versions. |
| Experiment Setup | Yes | As illustrated in Figure 1, we prompt LLMs such as LLa MA-2 (Touvron et al., 2023), Gemma (Gemma Team et al., 2024), and Mistral (Jiang et al., 2023) with a series of data points {Xi}n i=1 sampled independently and identically from an underlying distribution p(x). We set α = 1, effectively populating each bin with one "hallucinated" data point prior to observing any data (Jeffreys, 1946). Unless otherwise noted, we use C = 1 for classical KDE in this paper. For a given DE trajectory ˆp1(x), . . . , ˆpn(x), we optimize our bespoke KDE to minimize the Hellinger distance at each context length i: min si (0, ),hi (0, ) DHel(ˆpi(x) ˆphi,si(x)). While LLa MA-2 has a context window of 4096 tokens (equivalent to 1365 comma-delimited, 2-digit data points), we limit our analysis to a context length of n = 200. |