Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization
Authors: Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Kaelbling
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our contributions are primarily theoretical and aim to provide a theoretical account of the performance of end-to-end model-based methods. To help in this matter, we also provide some empirical results in simple illustrative problems which serve to demonstrate properties derived from our analysis.4 Empirical results |
| Researcher Affiliation | Academia | Clement Gehring Electrical Engineering and Computer Sciences Massachusetts Institute of Technology clement@gehring.io Kenji Kawaguchi Center of Mathematical Sciences and Applications Harvard University kkawaguchi@fas.harvard.edu Jiaoyang Huang Courant Institute of Mathematical Sciences New York University jh4427@nyu.edu Leslie Pack Kaelbling Electrical Engineering and Computer Sciences Massachusetts Institute of Technology lpk@csail.mit.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | All other implementation details, data and code are publicly available1. 1https://github.com/gehring/implicit-estimators |
| Open Datasets | Yes | We consider three simple, illustrative domains: a chain MDP, the four rooms domain and the mountain car domain, which we describe below. ... Four rooms [21]: ... Mountain car [13, 20]: |
| Dataset Splits | No | The paper mentions 'unseen trajectories' and 'generate an additional test dataset' but does not provide specific percentages or counts for training, validation, or test splits, nor does it refer to a standard predefined split with a citation. |
| Hardware Specification | No | The paper states 'We provide sufficient information to estimate this in the appendix as well as the hardware used.' but does not specify any particular hardware (e.g., GPU/CPU models) in the provided main text. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | For all experiments, we used a batch size k = 25. We ran these experiments with several combinations of learning rates and internal discounts but only present a few representative results here. |