reproducibilityindex.ai

Model-Based Reinforcement Learning with Value-Targeted Regression

Authors: Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin Yang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To complement the theoretical ﬁndings, results from a number of small-scale, synthetic experiments conﬁrm that our algorithm is competitive in terms of its regret. and 6. Numerical Experiments The goal of our experiments is to provide insight into the beneﬁts and/or pitfalls of using value-targets for ﬁtting models, both with and without optimistic planning. We run our experiments in the tabular setting as it is easy to keep aspects of the test environments under control while avoiding approximate computations.
Researcher Affiliation	Collaboration	1Amii and Department of Computing Science, University of Alberta 2School of Mathematical Science, Peking University 3Deep Mind 4Department of Electrical Engineering, Princeton University 5Department of Electrical and Computer Engineering, University of California, Los Angeles.
Pseudocode	Yes	Algorithm 1 UCRL-VTR
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	No	The paper describes synthetic environments (River Swim and Wide Tree) used for experiments, but does not provide concrete access information (links, DOIs, repositories) for public dataset availability. For example: 'The schematic diagram of the River Swim environment is shown in Figure 1.' and 'We introduce a novel tabular MDP we call Wide Tree.'
Dataset Splits	No	The paper describes experiments in reinforcement learning environments but does not mention traditional dataset splits (e.g., percentages, sample counts, or predefined splits) for training, validation, or testing, as it focuses on online learning. No relevant text about dataset splits was found.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. No relevant text was found.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers that would be needed to replicate the experiments. No relevant text was found.
Experiment Setup	Yes	We experiment with small environments with S = 5 and set the horizon to H = 20. The value that we found to work the best for EGRL-VTR is = 0.2 and the value that we found to work best for EG-Freq is = 0.12. We set = 0.1 in this environment, as this allows the model error of EG-Freq to match that of UC-Matrix RL.