Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Authors: Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Kaelbling

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our contributions are primarily theoretical and aim to provide a theoretical account of the performance of end-to-end model-based methods. To help in this matter, we also provide some empirical results in simple illustrative problems which serve to demonstrate properties derived from our analysis.4 Empirical results
Researcher Affiliation Academia Clement Gehring Electrical Engineering and Computer Sciences Massachusetts Institute of Technology clement@gehring.io Kenji Kawaguchi Center of Mathematical Sciences and Applications Harvard University kkawaguchi@fas.harvard.edu Jiaoyang Huang Courant Institute of Mathematical Sciences New York University jh4427@nyu.edu Leslie Pack Kaelbling Electrical Engineering and Computer Sciences Massachusetts Institute of Technology lpk@csail.mit.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes All other implementation details, data and code are publicly available1. 1https://github.com/gehring/implicit-estimators
Open Datasets Yes We consider three simple, illustrative domains: a chain MDP, the four rooms domain and the mountain car domain, which we describe below. ... Four rooms [21]: ... Mountain car [13, 20]:
Dataset Splits No The paper mentions 'unseen trajectories' and 'generate an additional test dataset' but does not provide specific percentages or counts for training, validation, or test splits, nor does it refer to a standard predefined split with a citation.
Hardware Specification No The paper states 'We provide sufficient information to estimate this in the appendix as well as the hardware used.' but does not specify any particular hardware (e.g., GPU/CPU models) in the provided main text.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes For all experiments, we used a batch size k = 25. We ran these experiments with several combinations of learning rates and internal discounts but only present a few representative results here.