Meta-Learning of Structured Task Distributions in Humans and Machines
Authors: Sreejan Kumar, Ishita Dasgupta, Jonathan Cohen, Nathaniel Daw, Thomas Griffiths
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train a standard meta-learning agent, a recurrent network trained with modelfree reinforcement learning, and compare it with human performance across the two task distributions. We find a double dissociation in which humans do better in the structured task distribution whereas agents do better in the null task distribution despite comparable statistical complexity. |
| Researcher Affiliation | Academia | 1Princeton Neuroscience Institute 2Department of Computer Science, Princeton University 3Department of Psychology, Princeton University |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm". |
| Open Source Code | No | The paper mentions "Stable baselines package Hill et al., 2018" with a GitHub URL, but this refers to a third-party tool used by the authors, not the source code for the methodology developed in this paper. |
| Open Datasets | No | The paper describes how they generated their own structured and null task distributions based on a grammar and statistical properties, but it does not provide concrete access information (e.g., a link, DOI, or repository) for these datasets. |
| Dataset Splits | Yes | We performed a hyperparamater sweep (value function loss coefficient, entropy loss coefficient, learning rate) using a held-out validation set for evaluation (see Appendix). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The agent was trained using Advantage Actor Critic (A2C) (Stable baselines package Hill et al., 2018). While a software package is mentioned, a specific version number for it is not provided. |
| Experiment Setup | Yes | The agent was trained with a linear learning rate schedule and 0.9 discount. The reward function was: +1 for revealing red tiles, -1 for blue tiles, +10 for the last red tile, and -2 for choosing an already revealed tile. The agent was trained for 10^6 episodes. ... The final selected hyperparameters for both task distribution were: value function coefficient=0.000675,entropy coefficient=0.000675, learning rate=0.00235. |