In-Context Learning through the Bayesian Prism

Authors: Madhur Panwar, Kabir Ahuja, Navin Goyal

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we empirically examine how far this Bayesian perspective can help us understand ICL.
Researcher Affiliation Collaboration Microsoft Research India {t-mpanwar, navingo}@microsoft.com University of Washington kahuja@cs.washington.edu
Pseudocode No The paper describes methods and equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release our code at https://github.com/mdrpanwar/icl-bayesian-prism
Open Datasets No In all of our experiments except the ones concerning the Fourier series, we choose DX as the standard normal distribution i.e. N(0, 1), unless specified otherwise. The paper describes how data is generated from distributions (e.g., standard normal) for functions, not referring to a pre-existing, downloadable public dataset.
Dataset Splits No The paper generates data dynamically from specified distributions for training and evaluation (e.g., 'x i Rd and are chosen i.i.d. from a distribution, and f : Rd R is a function from a family of functions'). It does not specify fixed training, validation, or test dataset splits.
Hardware Specification Yes Our experiments were conducted on a system comprising 32 NVIDIA V100 16GB GPUs.
Software Dependencies No The paper mentions using 'Pytorch', 'Huggingface Transformers', 'scikit-learn', and 'CVXPY' for implementation and baselines, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Unless specified otherwise, we use 12 layers, 8 heads, and a hidden size (dh) of 256 in the architecture for all of our experiments. We use a batch size of 64 and train the model for 500k steps. We use Adam optimizer... We train all of our models with curriculum...