Adaptive Discretization for Model-Based Reinforcement Learning
Authors: Sean Sinclair, Tianyu Wang, Gauri Jain, Siddhartha Banerjee, Christina Yu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate this via experiments on several canonical control problems, which shows that our algorithm empirically performs significantly better than fixed discretization in terms of both faster convergence and lower memory usage. |
| Researcher Affiliation | Academia | Sean R. Sinclair Cornell University srs429@cornell.edu Tianyu Wang Duke University tianyu@cs.duke.edu Gauri Jain Cornell University gauri.g.jain@gmail.com Siddhartha Banerjee Cornell University sbanerjee@cornell.edu Christina Lee Yu Cornell University cleeyu@cornell.edu |
| Pseudocode | Yes | For full pseudocode of the algorithm, and a discussion on implementation details, see Appendix G. ... Algorithm 1 Model-Based Reinforcement Learning with Adaptive Partitioning (ADAMB) |
| Open Source Code | Yes | The code for the experiments are available at https://github.com/seanrsinclair/ Adaptive QLearning. |
| Open Datasets | No | The paper refers to "several canonical control problems" and "synthetic experiments" (e.g., Oil Problem, Ambulance Problem) but does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset. These appear to be simulated environments rather than pre-existing datasets. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits. It describes episodic learning but not traditional data splits. |
| Hardware Specification | No | The paper mentions hardware in the context of other works (e.g., Alpha Go Zero used 4 TPUs and 64 GPUs) but does not provide any specific hardware details (GPU/CPU models, memory) used for their own experiments. |
| Software Dependencies | No | The paper does not list specific software components with their version numbers (e.g., "Python 3.8" or "PyTorch 1.9"). |
| Experiment Setup | Yes | ADAMB(S, A, D, H, K, δ) ... For all our simulations, we used a budget of K = 2000 episodes, and H = 20 as the horizon of the MDP. ... The splitting threshold is defined via n+(B) = φ2d Sℓ(B) d S > 2 n+(B) = φ2(d S+2)ℓ(B) d S 2 where the difference in terms comes from the Wasserstein concentration. This is in contrast to the splitting threshold for the model-free algorithm where n+(B) = 22ℓ(B) [38]. The φ term is chosen to minimize the dependence on H in the final regret bound where φ = H(d+d S)/(d+1). |