In-Context Learning Dynamics with Random Binary Sequences
Authors: Eric J Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tomer Ullman
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a framework that enables us to analyze in-context learning dynamics to understand latent concepts underlying LLMs behavioral patterns. This provides a more nuanced understanding than success-or-failure evaluation benchmarks, but does not require observing internal activations as a mechanistic interpretation of circuits would. Inspired by the cognitive science of human randomness perception, we use random binary sequences as context and study dynamics of in-context learning by manipulating properties of context data, such as sequence length. In the latest GPT3.5+ models, we find emergent abilities to generate seemingly random numbers and learn basic formal languages, with striking in-context learning dynamics where model outputs transition sharply from seemingly random behaviors to deterministic repetition. |
| Researcher Affiliation | Collaboration | Eric J. Bigelow1,2 , Ekdeep Singh Lubana2,3 , Robert P. Dick3, Hidenori Tanaka2,4 , Tomer D. Ullman1,2 1Psychology Department, Harvard University, Cambridge, MA, USA 2Center for Brain Science, Harvard University, Cambridge, MA, USA 3EECS Department, University of Michigan, Ann Arbor, MI, USA 4Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | All code and data used for this paper are available at https://github.com/ebigelow/ ICL-Random-Binary. |
| Open Datasets | Yes | All code and data used for this paper are available at https://github.com/ebigelow/ ICL-Random-Binary. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits (e.g., percentages or exact counts). It discusses varying context length and P(Tails) for data generation but not explicit dataset partitioning for reproducibility. |
| Hardware Specification | Yes | LLM inference was run on 2x NVIDIA A100 80gb GPUs in Harvard s FAS-RC cluster. |
| Software Dependencies | Yes | All calls were made with the Open AI API, using default parameters including important to our analysis a temperature parameter of 1.0. ... We evaluated the following LLMs: meta-llama/Llama-2-7b-hf (float32) ... mistralai/Mixtral-8x7B-Instruct-v0.1 (float16) ... allenai/tulu-2-dpo-70b (float16) |
| Experiment Setup | Yes | Temperature parameters are set to 1, and other parameters follow Open AI API defaults; see App. B for more details. ... We collect 200 output sequences y for each LLM at each P(Tails) [.05, .1, .2, .3, .4, .49, .5, .51, .60, .70, .80, .90, .95], cropping output tokens to |y| = 50 to limit cost and for simplicity. |