reproducibilityindex.ai

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Authors: Jonathan N Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude with several numerical simulations showing it is capable of reliably selecting a good model class. In Section 5, we demonstrate the effectiveness of MODBE on several simulated experimental domains. We use neural network-based ofﬂine RL algorithms as baselines and show that MODBE is able to reliably select a good model class.
Researcher Affiliation	Collaboration	Jonathan N. Lee Stanford University jnl@stanford.edu George Tucker Google Research gjt@google.com Oﬁr Nachum Google Research ofirnachum@google.com Bo Dai Google Research bodai@google.com Emma Brunskill Stanford University ebrun@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 Model Selection via Bellman Error (MODBE)
Open Source Code	Yes	1Supplementary material is available at: https://sites.google.com/stanford.edu/ofﬂine-model-selection. Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets	Yes	We evaluated MODBE in three simulated environments with discrete actions: (1) synthetic contextual bandits (CB), (2) Gym Cart Pole, (3) Gym Mountain Car.
Dataset Splits	Yes	All training and validation sets were split 80/20. Let ntrain =d0.8 ne and nvalid =b0.2 nc and split the dataset D randomly into Dtrain =(Dtrain,h) of ntrain samples and Dvalid =(Dvalid,h) of nvalid samples for each h2[H].
Hardware Specification	No	The paper's self-checklist indicates that hardware details are included ("Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]"), but these specific details are not present in the provided paper text or Appendix D.
Software Dependencies	No	Our setup for the RL problems in Gym (Brockman et al., 2016) builds on top of the open-source d3rlpy framework (Seno and Imai, 2021). We used DQN (Mnih et al., 2015), which is closest to FQI." The paper mentions software frameworks and libraries but does not provide specific version numbers for them.
Experiment Setup	Yes	All training and validation sets were split 80/20. We used DQN (Mnih et al., 2015), which is closest to FQI. We considered model classes that were two-layer neural networks with Re LU activations and d nodes in the hidden layer and varied the parameter d.