Model-Based Reinforcement Learning with Value-Targeted Regression

Authors: Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin Yang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To complement the theoretical findings, results from a number of small-scale, synthetic experiments confirm that our algorithm is competitive in terms of its regret. and 6. Numerical Experiments The goal of our experiments is to provide insight into the benefits and/or pitfalls of using value-targets for fitting models, both with and without optimistic planning. We run our experiments in the tabular setting as it is easy to keep aspects of the test environments under control while avoiding approximate computations.
Researcher Affiliation Collaboration 1Amii and Department of Computing Science, University of Alberta 2School of Mathematical Science, Peking University 3Deep Mind 4Department of Electrical Engineering, Princeton University 5Department of Electrical and Computer Engineering, University of California, Los Angeles.
Pseudocode Yes Algorithm 1 UCRL-VTR
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets No The paper describes synthetic environments (River Swim and Wide Tree) used for experiments, but does not provide concrete access information (links, DOIs, repositories) for public dataset availability. For example: 'The schematic diagram of the River Swim environment is shown in Figure 1.' and 'We introduce a novel tabular MDP we call Wide Tree.'
Dataset Splits No The paper describes experiments in reinforcement learning environments but does not mention traditional dataset splits (e.g., percentages, sample counts, or predefined splits) for training, validation, or testing, as it focuses on online learning. No relevant text about dataset splits was found.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. No relevant text was found.
Software Dependencies No The paper does not provide specific software dependencies with version numbers that would be needed to replicate the experiments. No relevant text was found.
Experiment Setup Yes We experiment with small environments with S = 5 and set the horizon to H = 20. The value that we found to work the best for EGRL-VTR is = 0.2 and the value that we found to work best for EG-Freq is = 0.12. We set = 0.1 in this environment, as this allows the model error of EG-Freq to match that of UC-Matrix RL.