Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Oracle Inequalities for Model Selection in Offline Reinforcement Learning
Authors: Jonathan N Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with several numerical simulations showing it is capable of reliably selecting a good model class. In Section 5, we demonstrate the effectiveness of MODBE on several simulated experimental domains. We use neural network-based offline RL algorithms as baselines and show that MODBE is able to reliably select a good model class. |
| Researcher Affiliation | Collaboration | Jonathan N. Lee Stanford University EMAIL George Tucker Google Research EMAIL Ofir Nachum Google Research EMAIL Bo Dai Google Research EMAIL Emma Brunskill Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 Model Selection via Bellman Error (MODBE) |
| Open Source Code | Yes | 1Supplementary material is available at: https://sites.google.com/stanford.edu/offline-model-selection. Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | We evaluated MODBE in three simulated environments with discrete actions: (1) synthetic contextual bandits (CB), (2) Gym Cart Pole, (3) Gym Mountain Car. |
| Dataset Splits | Yes | All training and validation sets were split 80/20. Let ntrain =d0.8 ne and nvalid =b0.2 nc and split the dataset D randomly into Dtrain =(Dtrain,h) of ntrain samples and Dvalid =(Dvalid,h) of nvalid samples for each h2[H]. |
| Hardware Specification | No | The paper's self-checklist indicates that hardware details are included ("Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]"), but these specific details are not present in the provided paper text or Appendix D. |
| Software Dependencies | No | Our setup for the RL problems in Gym (Brockman et al., 2016) builds on top of the open-source d3rlpy framework (Seno and Imai, 2021). We used DQN (Mnih et al., 2015), which is closest to FQI." The paper mentions software frameworks and libraries but does not provide specific version numbers for them. |
| Experiment Setup | Yes | All training and validation sets were split 80/20. We used DQN (Mnih et al., 2015), which is closest to FQI. We considered model classes that were two-layer neural networks with Re LU activations and d nodes in the hidden layer and varied the parameter d. |