reproducibilityindex.ai

Latent Learning Progress Drives Autonomous Goal Selection in Human Reinforcement Learning

Authors: Gaia Molinaro, Cédric Colas, Pierre-Yves Oudeyer, Anne Collins

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To test this hypothesis, we designed a hierarchical reinforcement learning task in which human participants (N = 175) repeatedly chose their own goals and learned goal-conditioned policies. Our behavioral and computational modeling results confirm the influence of latent learning progress on goal selection and uncover inter-individual differences, partially mediated by recognition of the environment s hierarchical structure.
Researcher Affiliation	Collaboration	Gaia Molinaro University of California, Berkeley gaiamolinaro@berkeley.edu Cédric Colas Massachusetts Institute of Technology ccolas@mit.edu Pierre-Yves Oudeyer Inria Centre at the University of Bordeaux pierre-yves.oudeyer@inria.fr Anne G. E. Collins University of California, Berkeley annecollins@berkeley.edu
Pseudocode	No	No section or figure explicitly labeled 'Pseudocode' or 'Algorithm' was found within the paper.
Open Source Code	No	The authors reserve the right to perform further analyses on the present data before publicly releasing them.
Open Datasets	No	The paper describes the use of a 'training phase' and a 'learning phase' with human participants, but does not provide concrete access information (link, DOI, etc.) for public availability of the dataset used. The NeurIPS checklist states: 'The authors reserve the right to perform further analyses on the present data before publicly releasing them.'
Dataset Splits	No	The paper describes 'training, learning, and testing phases' but does not explicitly mention a distinct validation set or split with specific percentages, sample counts, or predefined splits.
Hardware Specification	No	None of the analyses and modeling processes were so computationally expensive to require special computer resources.
Software Dependencies	No	The paper mentions 'Unity game engine' and 'hierarchical Bayesian inference (HBI; [88])' and 'standard Q-value reinforcement learning model [67]', but does not provide specific version numbers for these software components or any other libraries.
Experiment Setup	Yes	The experiment was implemented in the Unity game engine and presented as an interactive online game, in which healthy human participants (N = 175; see Appendix A for details) played the role of alchemists and could learn to make different potions from sets of ingredients. ... We model human goal selection as a multi-arm bandit problem, where the probability of selecting a goal depends on its subjective value relative to other goals. ... The probability P t(g ) of choosing goal g among possible goals G on trial t is obtained through a softmax function over the goal values ... Subjective goal values are updated as a function of experience through the delta rule [87]: V t+1 f (g ) = V t f (g ) + α δt f(g ) where α is a learning rate for value updates and δf is factor-dependent. ... fit parameters included a shared α across factors and weighting parameters βf for each factor in the model.