Latent Learning Progress Drives Autonomous Goal Selection in Human Reinforcement Learning
Authors: Gaia Molinaro, Cédric Colas, Pierre-Yves Oudeyer, Anne Collins
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test this hypothesis, we designed a hierarchical reinforcement learning task in which human participants (N = 175) repeatedly chose their own goals and learned goal-conditioned policies. Our behavioral and computational modeling results confirm the influence of latent learning progress on goal selection and uncover inter-individual differences, partially mediated by recognition of the environment s hierarchical structure. |
| Researcher Affiliation | Collaboration | Gaia Molinaro University of California, Berkeley gaiamolinaro@berkeley.edu Cédric Colas Massachusetts Institute of Technology ccolas@mit.edu Pierre-Yves Oudeyer Inria Centre at the University of Bordeaux pierre-yves.oudeyer@inria.fr Anne G. E. Collins University of California, Berkeley annecollins@berkeley.edu |
| Pseudocode | No | No section or figure explicitly labeled 'Pseudocode' or 'Algorithm' was found within the paper. |
| Open Source Code | No | The authors reserve the right to perform further analyses on the present data before publicly releasing them. |
| Open Datasets | No | The paper describes the use of a 'training phase' and a 'learning phase' with human participants, but does not provide concrete access information (link, DOI, etc.) for public availability of the dataset used. The NeurIPS checklist states: 'The authors reserve the right to perform further analyses on the present data before publicly releasing them.' |
| Dataset Splits | No | The paper describes 'training, learning, and testing phases' but does not explicitly mention a distinct validation set or split with specific percentages, sample counts, or predefined splits. |
| Hardware Specification | No | None of the analyses and modeling processes were so computationally expensive to require special computer resources. |
| Software Dependencies | No | The paper mentions 'Unity game engine' and 'hierarchical Bayesian inference (HBI; [88])' and 'standard Q-value reinforcement learning model [67]', but does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | The experiment was implemented in the Unity game engine and presented as an interactive online game, in which healthy human participants (N = 175; see Appendix A for details) played the role of alchemists and could learn to make different potions from sets of ingredients. ... We model human goal selection as a multi-arm bandit problem, where the probability of selecting a goal depends on its subjective value relative to other goals. ... The probability P t(g ) of choosing goal g among possible goals G on trial t is obtained through a softmax function over the goal values ... Subjective goal values are updated as a function of experience through the delta rule [87]: V t+1 f (g ) = V t f (g ) + α δt f(g ) where α is a learning rate for value updates and δf is factor-dependent. ... fit parameters included a shared α across factors and weighting parameters βf for each factor in the model. |