Model-Based Reinforcement Learning with Value-Targeted Regression
Authors: Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin Yang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To complement the theoretical findings, results from a number of small-scale, synthetic experiments confirm that our algorithm is competitive in terms of its regret. and 6. Numerical Experiments The goal of our experiments is to provide insight into the benefits and/or pitfalls of using value-targets for fitting models, both with and without optimistic planning. We run our experiments in the tabular setting as it is easy to keep aspects of the test environments under control while avoiding approximate computations. |
| Researcher Affiliation | Collaboration | 1Amii and Department of Computing Science, University of Alberta 2School of Mathematical Science, Peking University 3Deep Mind 4Department of Electrical Engineering, Princeton University 5Department of Electrical and Computer Engineering, University of California, Los Angeles. |
| Pseudocode | Yes | Algorithm 1 UCRL-VTR |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | No | The paper describes synthetic environments (River Swim and Wide Tree) used for experiments, but does not provide concrete access information (links, DOIs, repositories) for public dataset availability. For example: 'The schematic diagram of the River Swim environment is shown in Figure 1.' and 'We introduce a novel tabular MDP we call Wide Tree.' |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments but does not mention traditional dataset splits (e.g., percentages, sample counts, or predefined splits) for training, validation, or testing, as it focuses on online learning. No relevant text about dataset splits was found. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. No relevant text was found. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers that would be needed to replicate the experiments. No relevant text was found. |
| Experiment Setup | Yes | We experiment with small environments with S = 5 and set the horizon to H = 20. The value that we found to work the best for EGRL-VTR is = 0.2 and the value that we found to work best for EG-Freq is = 0.12. We set = 0.1 in this environment, as this allows the model error of EG-Freq to match that of UC-Matrix RL. |