reproducibilityindex.ai

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Authors: Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Kaelbling

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our contributions are primarily theoretical and aim to provide a theoretical account of the performance of end-to-end model-based methods. To help in this matter, we also provide some empirical results in simple illustrative problems which serve to demonstrate properties derived from our analysis.4 Empirical results
Researcher Affiliation	Academia	Clement Gehring Electrical Engineering and Computer Sciences Massachusetts Institute of Technology clement@gehring.io Kenji Kawaguchi Center of Mathematical Sciences and Applications Harvard University kkawaguchi@fas.harvard.edu Jiaoyang Huang Courant Institute of Mathematical Sciences New York University jh4427@nyu.edu Leslie Pack Kaelbling Electrical Engineering and Computer Sciences Massachusetts Institute of Technology lpk@csail.mit.edu
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	All other implementation details, data and code are publicly available1. 1https://github.com/gehring/implicit-estimators
Open Datasets	Yes	We consider three simple, illustrative domains: a chain MDP, the four rooms domain and the mountain car domain, which we describe below. ... Four rooms [21]: ... Mountain car [13, 20]:
Dataset Splits	No	The paper mentions 'unseen trajectories' and 'generate an additional test dataset' but does not provide specific percentages or counts for training, validation, or test splits, nor does it refer to a standard predefined split with a citation.
Hardware Specification	No	The paper states 'We provide sufﬁcient information to estimate this in the appendix as well as the hardware used.' but does not specify any particular hardware (e.g., GPU/CPU models) in the provided main text.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	For all experiments, we used a batch size k = 25. We ran these experiments with several combinations of learning rates and internal discounts but only present a few representative results here.