Meta-Learning of Structured Task Distributions in Humans and Machines

Authors: Sreejan Kumar, Ishita Dasgupta, Jonathan Cohen, Nathaniel Daw, Thomas Griffiths

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train a standard meta-learning agent, a recurrent network trained with modelfree reinforcement learning, and compare it with human performance across the two task distributions. We find a double dissociation in which humans do better in the structured task distribution whereas agents do better in the null task distribution despite comparable statistical complexity.
Researcher Affiliation Academia 1Princeton Neuroscience Institute 2Department of Computer Science, Princeton University 3Department of Psychology, Princeton University
Pseudocode No The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code No The paper mentions "Stable baselines package Hill et al., 2018" with a GitHub URL, but this refers to a third-party tool used by the authors, not the source code for the methodology developed in this paper.
Open Datasets No The paper describes how they generated their own structured and null task distributions based on a grammar and statistical properties, but it does not provide concrete access information (e.g., a link, DOI, or repository) for these datasets.
Dataset Splits Yes We performed a hyperparamater sweep (value function loss coefficient, entropy loss coefficient, learning rate) using a held-out validation set for evaluation (see Appendix).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The agent was trained using Advantage Actor Critic (A2C) (Stable baselines package Hill et al., 2018). While a software package is mentioned, a specific version number for it is not provided.
Experiment Setup Yes The agent was trained with a linear learning rate schedule and 0.9 discount. The reward function was: +1 for revealing red tiles, -1 for blue tiles, +10 for the last red tile, and -2 for choosing an already revealed tile. The agent was trained for 10^6 episodes. ... The final selected hyperparameters for both task distribution were: value function coefficient=0.000675,entropy coefficient=0.000675, learning rate=0.00235.