Modeling Human Exploration Through Resource-Rational Reinforcement Learning

Authors: Marcel Binz, Eric Schulz

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that RR-RL2 captures many aspects of human exploration by reanalyzing data from three previously conducted psychological studies. First, we show that it explains human choices in a two-armed bandit task better than traditional approaches, such as Thompson sampling [Thompson, 1933], upper confidence bound (UCB) algorithms [Kaufmann et al., 2012], and mixtures thereof [Gershman, 2018]. We then verify that the manipulation of computational resources in our class of models matches the manipulation of resources in human subjects in two different contexts.
Researcher Affiliation Academia Marcel Binz MPI for Biological Cybernetics Tübingen, Germany marcel.binz@tue.mpg.de Eric Schulz MPI for Biological Cybernetics Tübingen, Germany eric.schulz@tue.mpg.de
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format.
Open Source Code Yes A public implementation of our model and the following analyses is available under https://github.com/marcelbinz/ resource-rational-reinforcement-learning.
Open Datasets No The paper states, 'We used existing data from experiments conducted in previous work. This information can be found in the original articles.' While the original studies are cited (e.g., Gershman [2018], Bechara et al. [1994], Somerville et al. [2017]), the paper itself does not provide concrete access information (e.g., direct links, DOIs, or repository names) for these datasets.
Dataset Splits No The paper describes training models on 'distributions' over bandit problems and subsequently evaluating them on specific tasks (e.g., IGT) or by comparing behavior to human data. However, it does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility in the traditional sense, nor does it refer to predefined standard splits for its meta-learning process.
Hardware Specification No The paper states, 'No special compute resources were used during this project. Models were trained using standard CPUs on an internal cluster.' This description is too general and does not provide specific hardware details such as CPU models, memory specifications, or GPU types used for the experiments.
Software Dependencies No The paper mentions training parameters using 'Adam [Kingma and Ba, 2014]' and using 'the local reparameterization trick [Kingma et al., 2015]', but it does not specify any software dependencies (e.g., libraries, frameworks, or programming languages) with their corresponding version numbers.
Experiment Setup Yes The parameters of the recurrent neural networks were trained using Adam [Kingma and Ba, 2014] with a learning rate of 0.001. The number of hidden units was set to 256. For the Bayesian neural networks, we used the local reparameterization trick [Kingma et al., 2015] and assumed a Gaussian prior and a factorized Gaussian posterior for all of the weights. ... We trained RR-RL2 with a targeted description length of {1, 2, . . . , 10000} nats on the same distribution used in the original experimental study...