reproducibilityindex.ai

Meta-Learning of Structured Task Distributions in Humans and Machines

Authors: Sreejan Kumar, Ishita Dasgupta, Jonathan Cohen, Nathaniel Daw, Thomas Griffiths

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train a standard meta-learning agent, a recurrent network trained with modelfree reinforcement learning, and compare it with human performance across the two task distributions. We find a double dissociation in which humans do better in the structured task distribution whereas agents do better in the null task distribution despite comparable statistical complexity.
Researcher Affiliation	Academia	1Princeton Neuroscience Institute 2Department of Computer Science, Princeton University 3Department of Psychology, Princeton University
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code	No	The paper mentions "Stable baselines package Hill et al., 2018" with a GitHub URL, but this refers to a third-party tool used by the authors, not the source code for the methodology developed in this paper.
Open Datasets	No	The paper describes how they generated their own structured and null task distributions based on a grammar and statistical properties, but it does not provide concrete access information (e.g., a link, DOI, or repository) for these datasets.
Dataset Splits	Yes	We performed a hyperparamater sweep (value function loss coefﬁcient, entropy loss coefﬁcient, learning rate) using a held-out validation set for evaluation (see Appendix).
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The agent was trained using Advantage Actor Critic (A2C) (Stable baselines package Hill et al., 2018). While a software package is mentioned, a specific version number for it is not provided.
Experiment Setup	Yes	The agent was trained with a linear learning rate schedule and 0.9 discount. The reward function was: +1 for revealing red tiles, -1 for blue tiles, +10 for the last red tile, and -2 for choosing an already revealed tile. The agent was trained for 10^6 episodes. ... The ﬁnal selected hyperparameters for both task distribution were: value function coefﬁcient=0.000675,entropy coefﬁcient=0.000675, learning rate=0.00235.