LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

Authors: James Requeima, John Bronskill, Dami Choi, Richard Turner, David K. Duvenaud

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We define LLM Processes (LLMPs) using methods we develop for eliciting numerical predictive distributions from LLMs.2 LLMPs go beyond one-dimensional time series forecasting to multi-dimensional regression and density estimation. We propose two approaches for defining this joint predictive distribution over a collection of query points and evaluate their compatibility in principle with the consistency axioms necessary to specify a valid statistical process. We develop effective prompting practices for eliciting joint numerical predictions. We investigate various methods for conditioning LLMs on numerical data, including prompt formatting, ordering, and scaling. We characterize which schemes perform best on a set of synthetic tasks. We show that LLMPs are competitive and flexible regressors even on messy data. Through an extensive set of synthetic and real world experiments, including image reconstruction and black-box function optimization, we evaluate the zero-shot regression and forecasting performance of LLMPs. We demonstrate that LLMPs have well-calibrated uncertainty and are competitive with Gaussian Processes (GPs), LLMTime [2], and Optuna [3].
Researcher Affiliation Academia James Requeima University of Toronto Vector Institute requeima@cs.toronto.edu John Bronskill University of Cambridge jfb54@cam.ac.uk Dami Choi University of Toronto choidami@cs.toronto.edu Richard E. Turner University of Cambridge The Alan Turing Institute ret26@cam.ac.uk David Duvenaud University of Toronto Vector Institute duvenaud@cs.toronto.edu
Pseudocode Yes Algorithm 1 Pseudocode for sampling numbers from an LLM Algorithm 2 Pseudocode for computing the log pdf of y
Open Source Code Yes Source code available at: https://github.com/requeima/llm_processes
Open Datasets Yes We use the 12 synthetic function datasets (Linear, Exponential, Sigmoid, Log, Sine, Beat Inference, Linear + Cosine, Linear Sine, Gaussian Wave. Sinc, Quadratic, X Sine) from Gruver et al. [2] each of which consists of 200 discrete points. We construct 7 datasets each with 10 random seeds for each function with a subset of 5, 10, 15, 20, 25, 50, and 75 randomly training points sampled from the original 200 points.
Dataset Splits No The paper describes training points and target points (serving as test points) and how they are selected or removed, but it does not explicitly mention the use of a distinct 'validation' dataset split for hyperparameter tuning or model selection.
Hardware Specification Yes The experiments using the Mixtral 8 7B, Mixtral-8 7B-Instruct [7], Llama-2 70B [8], and Llama-3 70B [9] LLMs were run on two NVidia A100 GPUs with 80 GB of memory. The experiments using the Llama-2 7B [8] and Llama-3 8B [9] LLMs were run on one NVidia 3090 GPU with 24 GB of memory.
Software Dependencies No Py Torch is used as the basis for all of the experiments, with the exception of the Gaussian Processes baselines that are implemented using the GPy Torch package [37]. Specific version numbers for PyTorch or GPyTorch are not provided.
Experiment Setup Yes We use the Sigmoid, Quadratic, and Linear+Cosine functions with 10, 20 and 75 training points, respectively (see Appendix D.1) with I-LLMP using the Mixtral-8 7B LLM. Figure G.9 shows that performance is surprisingly insensitive to varying the LLM nucleus sampling parameter top-p [10] and LLM softmax temperature. Unless otherwise stated, we use 50 samples from the LLM at each target location x and compute the median and the 95% confidence interval of the sample distribution.