reproducibilityindex.ai

Adaptive Discretization for Model-Based Reinforcement Learning

Authors: Sean Sinclair, Tianyu Wang, Gauri Jain, Siddhartha Banerjee, Christina Yu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate this via experiments on several canonical control problems, which shows that our algorithm empirically performs signiﬁcantly better than ﬁxed discretization in terms of both faster convergence and lower memory usage.
Researcher Affiliation	Academia	Sean R. Sinclair Cornell University srs429@cornell.edu Tianyu Wang Duke University tianyu@cs.duke.edu Gauri Jain Cornell University gauri.g.jain@gmail.com Siddhartha Banerjee Cornell University sbanerjee@cornell.edu Christina Lee Yu Cornell University cleeyu@cornell.edu
Pseudocode	Yes	For full pseudocode of the algorithm, and a discussion on implementation details, see Appendix G. ... Algorithm 1 Model-Based Reinforcement Learning with Adaptive Partitioning (ADAMB)
Open Source Code	Yes	The code for the experiments are available at https://github.com/seanrsinclair/ Adaptive QLearning.
Open Datasets	No	The paper refers to "several canonical control problems" and "synthetic experiments" (e.g., Oil Problem, Ambulance Problem) but does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset. These appear to be simulated environments rather than pre-existing datasets.
Dataset Splits	No	The paper does not explicitly provide details about training, validation, or test dataset splits. It describes episodic learning but not traditional data splits.
Hardware Specification	No	The paper mentions hardware in the context of other works (e.g., Alpha Go Zero used 4 TPUs and 64 GPUs) but does not provide any specific hardware details (GPU/CPU models, memory) used for their own experiments.
Software Dependencies	No	The paper does not list specific software components with their version numbers (e.g., "Python 3.8" or "PyTorch 1.9").
Experiment Setup	Yes	ADAMB(S, A, D, H, K, δ) ... For all our simulations, we used a budget of K = 2000 episodes, and H = 20 as the horizon of the MDP. ... The splitting threshold is deﬁned via n+(B) = φ2d Sℓ(B) d S > 2 n+(B) = φ2(d S+2)ℓ(B) d S 2 where the difference in terms comes from the Wasserstein concentration. This is in contrast to the splitting threshold for the model-free algorithm where n+(B) = 22ℓ(B) [38]. The φ term is chosen to minimize the dependence on H in the ﬁnal regret bound where φ = H(d+d S)/(d+1).