Budgeting Counterfactual for Offline RL
Authors: Yao Liu, Pratik Chaudhari, Rasool Fakoor
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks. |
| Researcher Affiliation | Collaboration | Yao Liu1, Pratik Chaudhari1,2, Rasool Fakoor1 1Amazon Web Services, 2University of Pennsylvania {yaoliuai,prtic,fakoor}@amazon.com |
| Pseudocode | Yes | Algorithm 1 BCOL Training and Algorithm 2 BCOL Inference |
| Open Source Code | No | We will release the code to reproduce the experiment upon publication. |
| Open Datasets | Yes | We evaluate our BCOL algorithm against prior offline RL methods on the Open AI gym Mu Jo Co tasks and Ant Maze tasks in the D4RL benchmark [9]. |
| Dataset Splits | No | The paper describes evaluation procedures and random seeds but does not explicitly state training/validation/test dataset splits in terms of percentages or sample counts. |
| Hardware Specification | Yes | Machine Type AWS EC2 g4dn.2xlarge GPU Tesla T4 CPU Intel Xeon 2.5GHz |
| Software Dependencies | Yes | CUDA version 11.0 NVIDIA Driver 450.142.00 PyTorch version 1.12.1 Gym version 19.0 Python version 3.8.13 NumPy version 1.21.5 D4RL datasets version v2 |
| Experiment Setup | Yes | We list all hyperparameter values and neural architecture details in Table 2. For SAC-based implementation, we follow CDC’s [8] hyper-parameter values. The only exception is that we use 2 Q functions instead of 4, and we sample 5 actions from actor (m) instead of 15. |