Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Latent Learning Progress Drives Autonomous Goal Selection in Human Reinforcement Learning

Authors: Gaia Molinaro, Cédric Colas, Pierre-Yves Oudeyer, Anne Collins

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To test this hypothesis, we designed a hierarchical reinforcement learning task in which human participants (N = 175) repeatedly chose their own goals and learned goal-conditioned policies. Our behavioral and computational modeling results confirm the influence of latent learning progress on goal selection and uncover inter-individual differences, partially mediated by recognition of the environment s hierarchical structure.
Researcher Affiliation Collaboration Gaia Molinaro University of California, Berkeley EMAIL Cédric Colas Massachusetts Institute of Technology EMAIL Pierre-Yves Oudeyer Inria Centre at the University of Bordeaux EMAIL Anne G. E. Collins University of California, Berkeley EMAIL
Pseudocode No No section or figure explicitly labeled 'Pseudocode' or 'Algorithm' was found within the paper.
Open Source Code No The authors reserve the right to perform further analyses on the present data before publicly releasing them.
Open Datasets No The paper describes the use of a 'training phase' and a 'learning phase' with human participants, but does not provide concrete access information (link, DOI, etc.) for public availability of the dataset used. The NeurIPS checklist states: 'The authors reserve the right to perform further analyses on the present data before publicly releasing them.'
Dataset Splits No The paper describes 'training, learning, and testing phases' but does not explicitly mention a distinct validation set or split with specific percentages, sample counts, or predefined splits.
Hardware Specification No None of the analyses and modeling processes were so computationally expensive to require special computer resources.
Software Dependencies No The paper mentions 'Unity game engine' and 'hierarchical Bayesian inference (HBI; [88])' and 'standard Q-value reinforcement learning model [67]', but does not provide specific version numbers for these software components or any other libraries.
Experiment Setup Yes The experiment was implemented in the Unity game engine and presented as an interactive online game, in which healthy human participants (N = 175; see Appendix A for details) played the role of alchemists and could learn to make different potions from sets of ingredients. ... We model human goal selection as a multi-arm bandit problem, where the probability of selecting a goal depends on its subjective value relative to other goals. ... The probability P t(g ) of choosing goal g among possible goals G on trial t is obtained through a softmax function over the goal values ... Subjective goal values are updated as a function of experience through the delta rule [87]: V t+1 f (g ) = V t f (g ) + α δt f(g ) where α is a learning rate for value updates and δf is factor-dependent. ... fit parameters included a shared α across factors and weighting parameters βf for each factor in the model.