Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning
Authors: Guy Tennenholtz, Shie Mannor
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We leverage our method for uncertainty estimation in a pessimistic model-based framework, showing a significant improvement upon contemporary model-based offline approaches on continuous control and autonomous driving benchmarks. |
| Researcher Affiliation | Collaboration | Guy Tennenholtz Technion Institute of Technology Shie Mannor Technion Institute of Technology & Nvidia Research |
| Pseudocode | Yes | Algorithm 1 GELATO: Geometrically Enriched LATent model for Offline reinforcement learning |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] (Section 1a of checklist) |
| Open Datasets | Yes | We used D4RL [Fu et al., 2020] and the autonomous vehicle environments highway-env [Leurent, 2018] as benchmarks for all of our experiments. |
| Dataset Splits | No | The paper describes datasets like D4RL and highway-env and mentions training with 1M or 2M samples, but it does not specify explicit train/validation/test splits or percentages. |
| Hardware Specification | Yes | All agents were trained... using a single GPU (RTX 2080)... |
| Software Dependencies | No | No specific software dependencies with version numbers were provided. The paper mentions using 'FAISS', 'Soft Learning', and 'PPO' but without version details. |
| Experiment Setup | Yes | We set k = 5 neighbors for the penalized reward (Equation (3)). All agents were trained for 1M steps (for continuous control benchmarks) and 350K steps (for the driving benchmarks)... and averaged over 5 seeds. |