In-Context Learning Agents Are Asymmetric Belief Updaters
Authors: Johannes A. Schubert, Akshay Kumar Jagadish, Marcel Binz, Eric Schulz
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology. We find that LLMs update their beliefs in an asymmetric manner and learn more from better-than-expected outcomes than from worse-than-expected ones. Furthermore, we show that this effect reverses when learning about counterfactual feedback and disappears when no agency is implied. We corroborate these findings by investigating idealized in-context learning agents derived through meta-reinforcement learning, where we observe similar patterns. |
| Researcher Affiliation | Academia | 1Computational Principles of Intelligence Lab, Max Planck Institute for Biological Cybernetics, T ubingen, Germany 2Institute for Human-Centered AI, Helmholtz Computational Health Center, Munich, Germany. Correspondence to: Johannes A. Schubert <johannes.schubert@tue.mpg.de> |
| Pseudocode | No | The paper describes methods and models using natural language and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/jschbrt/In Context-Learning-Dynamics. |
| Open Datasets | No | The paper describes generating synthetic data through LLM simulations and Meta-RL agent training within defined task environments, but it does not use or provide access information for a publicly available, pre-existing dataset. |
| Dataset Splits | No | The paper describes simulations and Meta-RL agent training but does not provide details on specific training, validation, or test dataset splits for reproducibility. |
| Hardware Specification | No | The paper mentions that a simulated run took 'between ten seconds and three minutes on a standard desktop computer,' but it does not provide specific hardware details such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'scipy s minimize function' and 'L-BFGS-B algorithm', but it does not specify version numbers for these software components. |
| Experiment Setup | Yes | We used Claude-1.2 as the reference LLM for all our experiments via its API1 with the temperature set to 0.0. The agent consisted of a Transformer network with a model dimension of 8 input size, two feedforward layers with a dimension of 128, and eight attention heads, followed by two linear layers that output a policy and a value estimate, respectively. The initial learning rate for ADAM was 0.0003. For the actor-critic loss, we used a discount factor of 0.8 and weighted the critic loss with 0.5. Starting with an entropy coefficient of 1, we linearly decayed the influence of the entropy term to 0 after half of the 5,000 episodes. We used a batch size of 64 during training. |