Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Authors: Jonathan N Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude with several numerical simulations showing it is capable of reliably selecting a good model class. In Section 5, we demonstrate the effectiveness of MODBE on several simulated experimental domains. We use neural network-based offline RL algorithms as baselines and show that MODBE is able to reliably select a good model class.
Researcher Affiliation Collaboration Jonathan N. Lee Stanford University jnl@stanford.edu George Tucker Google Research gjt@google.com Ofir Nachum Google Research ofirnachum@google.com Bo Dai Google Research bodai@google.com Emma Brunskill Stanford University ebrun@cs.stanford.edu
Pseudocode Yes Algorithm 1 Model Selection via Bellman Error (MODBE)
Open Source Code Yes 1Supplementary material is available at: https://sites.google.com/stanford.edu/offline-model-selection. Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes We evaluated MODBE in three simulated environments with discrete actions: (1) synthetic contextual bandits (CB), (2) Gym Cart Pole, (3) Gym Mountain Car.
Dataset Splits Yes All training and validation sets were split 80/20. Let ntrain =d0.8 ne and nvalid =b0.2 nc and split the dataset D randomly into Dtrain =(Dtrain,h) of ntrain samples and Dvalid =(Dvalid,h) of nvalid samples for each h2[H].
Hardware Specification No The paper's self-checklist indicates that hardware details are included ("Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]"), but these specific details are not present in the provided paper text or Appendix D.
Software Dependencies No Our setup for the RL problems in Gym (Brockman et al., 2016) builds on top of the open-source d3rlpy framework (Seno and Imai, 2021). We used DQN (Mnih et al., 2015), which is closest to FQI." The paper mentions software frameworks and libraries but does not provide specific version numbers for them.
Experiment Setup Yes All training and validation sets were split 80/20. We used DQN (Mnih et al., 2015), which is closest to FQI. We considered model classes that were two-layer neural networks with Re LU activations and d nodes in the hidden layer and varied the parameter d.