In-Context Learning Agents Are Asymmetric Belief Updaters

Authors: Johannes A. Schubert, Akshay Kumar Jagadish, Marcel Binz, Eric Schulz

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology. We find that LLMs update their beliefs in an asymmetric manner and learn more from better-than-expected outcomes than from worse-than-expected ones. Furthermore, we show that this effect reverses when learning about counterfactual feedback and disappears when no agency is implied. We corroborate these findings by investigating idealized in-context learning agents derived through meta-reinforcement learning, where we observe similar patterns.
Researcher Affiliation Academia 1Computational Principles of Intelligence Lab, Max Planck Institute for Biological Cybernetics, T ubingen, Germany 2Institute for Human-Centered AI, Helmholtz Computational Health Center, Munich, Germany. Correspondence to: Johannes A. Schubert <johannes.schubert@tue.mpg.de>
Pseudocode No The paper describes methods and models using natural language and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is publicly available at https://github.com/jschbrt/In Context-Learning-Dynamics.
Open Datasets No The paper describes generating synthetic data through LLM simulations and Meta-RL agent training within defined task environments, but it does not use or provide access information for a publicly available, pre-existing dataset.
Dataset Splits No The paper describes simulations and Meta-RL agent training but does not provide details on specific training, validation, or test dataset splits for reproducibility.
Hardware Specification No The paper mentions that a simulated run took 'between ten seconds and three minutes on a standard desktop computer,' but it does not provide specific hardware details such as GPU or CPU models, or memory specifications.
Software Dependencies No The paper mentions using 'scipy s minimize function' and 'L-BFGS-B algorithm', but it does not specify version numbers for these software components.
Experiment Setup Yes We used Claude-1.2 as the reference LLM for all our experiments via its API1 with the temperature set to 0.0. The agent consisted of a Transformer network with a model dimension of 8 input size, two feedforward layers with a dimension of 128, and eight attention heads, followed by two linear layers that output a policy and a value estimate, respectively. The initial learning rate for ADAM was 0.0003. For the actor-critic loss, we used a discount factor of 0.8 and weighted the critic loss with 0.5. Starting with an entropy coefficient of 1, we linearly decayed the influence of the entropy term to 0 after half of the 5,000 episodes. We used a batch size of 64 during training.