reproducibilityindex.ai

In-Context Learning Dynamics with Random Binary Sequences

Authors: Eric J Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tomer Ullman

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a framework that enables us to analyze in-context learning dynamics to understand latent concepts underlying LLMs behavioral patterns. This provides a more nuanced understanding than success-or-failure evaluation benchmarks, but does not require observing internal activations as a mechanistic interpretation of circuits would. Inspired by the cognitive science of human randomness perception, we use random binary sequences as context and study dynamics of in-context learning by manipulating properties of context data, such as sequence length. In the latest GPT3.5+ models, we find emergent abilities to generate seemingly random numbers and learn basic formal languages, with striking in-context learning dynamics where model outputs transition sharply from seemingly random behaviors to deterministic repetition.
Researcher Affiliation	Collaboration	Eric J. Bigelow1,2 , Ekdeep Singh Lubana2,3 , Robert P. Dick3, Hidenori Tanaka2,4 , Tomer D. Ullman1,2 1Psychology Department, Harvard University, Cambridge, MA, USA 2Center for Brain Science, Harvard University, Cambridge, MA, USA 3EECS Department, University of Michigan, Ann Arbor, MI, USA 4Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	All code and data used for this paper are available at https://github.com/ebigelow/ ICL-Random-Binary.
Open Datasets	Yes	All code and data used for this paper are available at https://github.com/ebigelow/ ICL-Random-Binary.
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits (e.g., percentages or exact counts). It discusses varying context length and P(Tails) for data generation but not explicit dataset partitioning for reproducibility.
Hardware Specification	Yes	LLM inference was run on 2x NVIDIA A100 80gb GPUs in Harvard s FAS-RC cluster.
Software Dependencies	Yes	All calls were made with the Open AI API, using default parameters including important to our analysis a temperature parameter of 1.0. ... We evaluated the following LLMs: meta-llama/Llama-2-7b-hf (float32) ... mistralai/Mixtral-8x7B-Instruct-v0.1 (float16) ... allenai/tulu-2-dpo-70b (float16)
Experiment Setup	Yes	Temperature parameters are set to 1, and other parameters follow Open AI API defaults; see App. B for more details. ... We collect 200 output sequences y for each LLM at each P(Tails) [.05, .1, .2, .3, .4, .49, .5, .51, .60, .70, .80, .90, .95], cropping output tokens to \|y\| = 50 to limit cost and for simplicity.