reproducibilityindex.ai

The Benefits of Model-Based Generalization in Reinforcement Learning

Authors: Kenny John Young, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we provide theoretical and empirical insight into when, and how, we can expect data generated by a learned model to be useful. First, we provide a simple theorem motivating how learning a model as an intermediate step can narrow down the set of possible value functions more than learning a value function directly from data using the Bellman equation. Second, we provide an illustrative example showing empirically how a similar effect occurs in a more concrete setting with neural network function approximation. Finally, we provide extensive experiments showing the benefit of model-based learning for online RL in environments with combinatorial complexity, but factored structure that allows a learned model to generalize.
Researcher Affiliation	Academia	1University of Alberta and the Alberta Machine Intelligence Institute 2The Swiss AI Lab IDSIA/USI/SUPSI 3AI Initiative, KAUST, Thuwal, Saudi Arabia.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce the main experiments is available at: https://github.com/kenjyoung/Model_Generalization_Code_supplement.
Open Datasets	No	The paper describes generating data within custom environments (Proc Maze, Button Grid, Pan Flute, Open Grid) and using 'training datasets' or 'model-generated transitions'. It does not state that these specific datasets are publicly available, nor does it refer to pre-existing, publicly accessible datasets with direct access information.
Dataset Splits	No	The paper describes an online learning setting where data is generated through interaction with environments, rather than explicit splits of a fixed dataset. It mentions 'evaluating each hyperparameter setting' but does not specify a separate validation dataset split.
Hardware Specification	No	The paper states: 'We were able to run 30 seeds efficiently in parallel on a single GPU using automatic batching in JAX (Bradbury et al., 2018).' This only specifies 'a single GPU' without providing any specific model number or other hardware details.
Software Dependencies	No	The paper mentions 'JAX (Bradbury et al., 2018)' and 'Optimizer Adam' without specifying version numbers for these software components. Table 1 lists parameters for Adam but not its software version.
Experiment Setup	Yes	Table 1: Table of hyperparameters used in experiments in Section 4. (Includes details like Number of Hidden Layers 3, Number of Hidden Units 200, Hidden Activation ELU, Optimizer Adam, Discount Factor 0.9, Batch Size, Target Network Update Frequency, Buffer Size, Model Learning Step-Size, Rollout Length, etc.)