Latent Learning Progress Drives Autonomous Goal Selection in Human Reinforcement Learning

Authors: Gaia Molinaro, Cédric Colas, Pierre-Yves Oudeyer, Anne Collins

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To test this hypothesis, we designed a hierarchical reinforcement learning task in which human participants (N = 175) repeatedly chose their own goals and learned goal-conditioned policies. Our behavioral and computational modeling results confirm the influence of latent learning progress on goal selection and uncover inter-individual differences, partially mediated by recognition of the environment s hierarchical structure.
Researcher Affiliation Collaboration Gaia Molinaro University of California, Berkeley gaiamolinaro@berkeley.edu Cédric Colas Massachusetts Institute of Technology ccolas@mit.edu Pierre-Yves Oudeyer Inria Centre at the University of Bordeaux pierre-yves.oudeyer@inria.fr Anne G. E. Collins University of California, Berkeley annecollins@berkeley.edu
Pseudocode No No section or figure explicitly labeled 'Pseudocode' or 'Algorithm' was found within the paper.
Open Source Code No The authors reserve the right to perform further analyses on the present data before publicly releasing them.
Open Datasets No The paper describes the use of a 'training phase' and a 'learning phase' with human participants, but does not provide concrete access information (link, DOI, etc.) for public availability of the dataset used. The NeurIPS checklist states: 'The authors reserve the right to perform further analyses on the present data before publicly releasing them.'
Dataset Splits No The paper describes 'training, learning, and testing phases' but does not explicitly mention a distinct validation set or split with specific percentages, sample counts, or predefined splits.
Hardware Specification No None of the analyses and modeling processes were so computationally expensive to require special computer resources.
Software Dependencies No The paper mentions 'Unity game engine' and 'hierarchical Bayesian inference (HBI; [88])' and 'standard Q-value reinforcement learning model [67]', but does not provide specific version numbers for these software components or any other libraries.
Experiment Setup Yes The experiment was implemented in the Unity game engine and presented as an interactive online game, in which healthy human participants (N = 175; see Appendix A for details) played the role of alchemists and could learn to make different potions from sets of ingredients. ... We model human goal selection as a multi-arm bandit problem, where the probability of selecting a goal depends on its subjective value relative to other goals. ... The probability P t(g ) of choosing goal g among possible goals G on trial t is obtained through a softmax function over the goal values ... Subjective goal values are updated as a function of experience through the delta rule [87]: V t+1 f (g ) = V t f (g ) + α δt f(g ) where α is a learning rate for value updates and δf is factor-dependent. ... fit parameters included a shared α across factors and weighting parameters βf for each factor in the model.