Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity
Authors: Alekh Agarwal, Tong Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We numerically verify the suboptimality bounds via some standard reinforcement learning problems in Section 5. In this section, we numerically evaluate the performance of MB-OPS and compare it against baselines on common benchmarks. |
| Researcher Affiliation | Academia | Ruichu Cai1, Zhiwen Wu1, Zhen Liu2, Xinwang Liu3,∗ 1 School of Computer Science and Engineering, South China University of Technology 2 Peng Cheng Laboratory, Shenzhen 3 National University of Defense Technology |
| Pseudocode | Yes | Algorithm 1 Model-Based Optimistic Posterior Sampling (MB-OPS) |
| Open Source Code | No | The paper does not provide explicit statements about the public release of source code or links to a repository. |
| Open Datasets | Yes | We consider three environments: River Swim, Chain, and Mountain Car, which are widely used benchmarks in reinforcement learning. |
| Dataset Splits | No | The paper describes reinforcement learning environments (River Swim, Chain, Mountain Car) where data is generated through interaction, but it does not specify traditional train/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., 'Python 3.x', 'PyTorch x.x'). |
| Experiment Setup | Yes | For the River Swim environment, we set the discount factor γ = 0.99, exploration parameter c = 0.1, the number of episodes to 2000, and the number of steps in each episode to 50. We set the step size α = 0.01 for the Mountain Car problem and a smaller value α = 0.001 for the River Swim and Chain environments. |