reproducibilityindex.ai

Budgeting Counterfactual for Offline RL

Authors: Yao Liu, Pratik Chaudhari, Rasool Fakoor

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks.
Researcher Affiliation	Collaboration	Yao Liu1, Pratik Chaudhari1,2, Rasool Fakoor1 1Amazon Web Services, 2University of Pennsylvania {yaoliuai,prtic,fakoor}@amazon.com
Pseudocode	Yes	Algorithm 1 BCOL Training and Algorithm 2 BCOL Inference
Open Source Code	No	We will release the code to reproduce the experiment upon publication.
Open Datasets	Yes	We evaluate our BCOL algorithm against prior offline RL methods on the Open AI gym Mu Jo Co tasks and Ant Maze tasks in the D4RL benchmark [9].
Dataset Splits	No	The paper describes evaluation procedures and random seeds but does not explicitly state training/validation/test dataset splits in terms of percentages or sample counts.
Hardware Specification	Yes	Machine Type AWS EC2 g4dn.2xlarge GPU Tesla T4 CPU Intel Xeon 2.5GHz
Software Dependencies	Yes	CUDA version 11.0 NVIDIA Driver 450.142.00 PyTorch version 1.12.1 Gym version 19.0 Python version 3.8.13 NumPy version 1.21.5 D4RL datasets version v2
Experiment Setup	Yes	We list all hyperparameter values and neural architecture details in Table 2. For SAC-based implementation, we follow CDC’s [8] hyper-parameter values. The only exception is that we use 2 Q functions instead of 4, and we sample 5 actions from actor (m) instead of 15.