Adaptive Discretization for Model-Based Reinforcement Learning

Authors: Sean Sinclair, Tianyu Wang, Gauri Jain, Siddhartha Banerjee, Christina Yu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate this via experiments on several canonical control problems, which shows that our algorithm empirically performs significantly better than fixed discretization in terms of both faster convergence and lower memory usage.
Researcher Affiliation Academia Sean R. Sinclair Cornell University srs429@cornell.edu Tianyu Wang Duke University tianyu@cs.duke.edu Gauri Jain Cornell University gauri.g.jain@gmail.com Siddhartha Banerjee Cornell University sbanerjee@cornell.edu Christina Lee Yu Cornell University cleeyu@cornell.edu
Pseudocode Yes For full pseudocode of the algorithm, and a discussion on implementation details, see Appendix G. ... Algorithm 1 Model-Based Reinforcement Learning with Adaptive Partitioning (ADAMB)
Open Source Code Yes The code for the experiments are available at https://github.com/seanrsinclair/ Adaptive QLearning.
Open Datasets No The paper refers to "several canonical control problems" and "synthetic experiments" (e.g., Oil Problem, Ambulance Problem) but does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset. These appear to be simulated environments rather than pre-existing datasets.
Dataset Splits No The paper does not explicitly provide details about training, validation, or test dataset splits. It describes episodic learning but not traditional data splits.
Hardware Specification No The paper mentions hardware in the context of other works (e.g., Alpha Go Zero used 4 TPUs and 64 GPUs) but does not provide any specific hardware details (GPU/CPU models, memory) used for their own experiments.
Software Dependencies No The paper does not list specific software components with their version numbers (e.g., "Python 3.8" or "PyTorch 1.9").
Experiment Setup Yes ADAMB(S, A, D, H, K, δ) ... For all our simulations, we used a budget of K = 2000 episodes, and H = 20 as the horizon of the MDP. ... The splitting threshold is defined via n+(B) = φ2d Sℓ(B) d S > 2 n+(B) = φ2(d S+2)ℓ(B) d S 2 where the difference in terms comes from the Wasserstein concentration. This is in contrast to the splitting threshold for the model-free algorithm where n+(B) = 22ℓ(B) [38]. The φ term is chosen to minimize the dependence on H in the final regret bound where φ = H(d+d S)/(d+1).