Budgeting Counterfactual for Offline RL

Authors: Yao Liu, Pratik Chaudhari, Rasool Fakoor

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks.
Researcher Affiliation Collaboration Yao Liu1, Pratik Chaudhari1,2, Rasool Fakoor1 1Amazon Web Services, 2University of Pennsylvania {yaoliuai,prtic,fakoor}@amazon.com
Pseudocode Yes Algorithm 1 BCOL Training and Algorithm 2 BCOL Inference
Open Source Code No We will release the code to reproduce the experiment upon publication.
Open Datasets Yes We evaluate our BCOL algorithm against prior offline RL methods on the Open AI gym Mu Jo Co tasks and Ant Maze tasks in the D4RL benchmark [9].
Dataset Splits No The paper describes evaluation procedures and random seeds but does not explicitly state training/validation/test dataset splits in terms of percentages or sample counts.
Hardware Specification Yes Machine Type AWS EC2 g4dn.2xlarge GPU Tesla T4 CPU Intel Xeon 2.5GHz
Software Dependencies Yes CUDA version 11.0 NVIDIA Driver 450.142.00 PyTorch version 1.12.1 Gym version 19.0 Python version 3.8.13 NumPy version 1.21.5 D4RL datasets version v2
Experiment Setup Yes We list all hyperparameter values and neural architecture details in Table 2. For SAC-based implementation, we follow CDC’s [8] hyper-parameter values. The only exception is that we use 2 Q functions instead of 4, and we sample 5 actions from actor (m) instead of 15.