Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Authors: Jonathan N Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude with several numerical simulations showing it is capable of reliably selecting a good model class. In Section 5, we demonstrate the effectiveness of MODBE on several simulated experimental domains. We use neural network-based offline RL algorithms as baselines and show that MODBE is able to reliably select a good model class.
Researcher Affiliation Collaboration Jonathan N. Lee Stanford University EMAIL George Tucker Google Research EMAIL Ofir Nachum Google Research EMAIL Bo Dai Google Research EMAIL Emma Brunskill Stanford University EMAIL
Pseudocode Yes Algorithm 1 Model Selection via Bellman Error (MODBE)
Open Source Code Yes 1Supplementary material is available at: https://sites.google.com/stanford.edu/offline-model-selection. Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes We evaluated MODBE in three simulated environments with discrete actions: (1) synthetic contextual bandits (CB), (2) Gym Cart Pole, (3) Gym Mountain Car.
Dataset Splits Yes All training and validation sets were split 80/20. Let ntrain =d0.8 ne and nvalid =b0.2 nc and split the dataset D randomly into Dtrain =(Dtrain,h) of ntrain samples and Dvalid =(Dvalid,h) of nvalid samples for each h2[H].
Hardware Specification No The paper's self-checklist indicates that hardware details are included ("Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]"), but these specific details are not present in the provided paper text or Appendix D.
Software Dependencies No Our setup for the RL problems in Gym (Brockman et al., 2016) builds on top of the open-source d3rlpy framework (Seno and Imai, 2021). We used DQN (Mnih et al., 2015), which is closest to FQI." The paper mentions software frameworks and libraries but does not provide specific version numbers for them.
Experiment Setup Yes All training and validation sets were split 80/20. We used DQN (Mnih et al., 2015), which is closest to FQI. We considered model classes that were two-layer neural networks with Re LU activations and d nodes in the hidden layer and varied the parameter d.