Meta-Reinforcement Learning with Self-Modifying Networks

Authors: Mathieu Chalvidal, Thomas Serre, Rufin VanRullen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experimental evaluation, we investigate the reinforcement learning strategies implemented by the model and demonstrate that a single layer with lightweight parametrization can implement a wide spectrum of cognitive functions, from one-shot learning to continuous motor-control. In Section 5 we report experimental results in multiple contexts.
Researcher Affiliation Academia Mathieu Chalvidal Artificial and Natural Intelligence Toulouse Institute Universite de Toulouse, France mathieu_chalvid@brown.edu Thomas Serre Carney Institute for Brain Science Brown University, U.S. thomas_serre@brown.edu Rufin Van Rullen Centre de Recherche Cerveau & Cognition CNRS, Universite de Toulouse, France rufin.vanrullen@cnrs.fr
Pseudocode Yes Algorithm 1: Met ODS synaptic learning
Open Source Code Yes Details for each experimental setting are further discussed in S.I. and the code can be found at https://github.com/mathieuchal/metods2022.
Open Datasets Yes To first illustrate that learnt synaptic dynamics can support fast behavioral adaptation, we use a classic experiment from the neuroscience literature originally presented by Harlow [79] and recently reintroduced in artificial meta-RL in [54] as well as a heavily-benchmarked Mu Jo Co directional locomotion task (see Fig. 3). and First, we use the dexterous manipulation benchmark proposed in [81] using the benchmark suite [82], in which a Sawyer robot is tasked with diverse operations.
Dataset Splits No The information is insufficient because while the paper refers to standard benchmarks and mentions 'training', it does not explicitly provide specific percentages, sample counts, or detailed methodologies for training, validation, and test dataset splits within the main text.
Hardware Specification No The information is insufficient because the paper mentions 'computational resource constraints' but does not specify any exact hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies No The information is insufficient because the paper does not list specific version numbers for any software dependencies, libraries, or solvers used in the experiments.
Experiment Setup No The information is insufficient because while the paper describes general experiment scenarios (e.g., 'full adaptation episode consists in N=10 rollouts of 500 timesteps'), it defers detailed experimental settings, including specific hyperparameter values or comprehensive training configurations, to supplementary information ('Details for each experimental setting are further discussed in S.I.').