Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Budgeting Counterfactual for Offline RL
Authors: Yao Liu, Pratik Chaudhari, Rasool Fakoor
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks. |
| Researcher Affiliation | Collaboration | Yao Liu1, Pratik Chaudhari1,2, Rasool Fakoor1 1Amazon Web Services, 2University of Pennsylvania EMAIL |
| Pseudocode | Yes | Algorithm 1 BCOL Training and Algorithm 2 BCOL Inference |
| Open Source Code | No | We will release the code to reproduce the experiment upon publication. |
| Open Datasets | Yes | We evaluate our BCOL algorithm against prior offline RL methods on the Open AI gym Mu Jo Co tasks and Ant Maze tasks in the D4RL benchmark [9]. |
| Dataset Splits | No | The paper describes evaluation procedures and random seeds but does not explicitly state training/validation/test dataset splits in terms of percentages or sample counts. |
| Hardware Specification | Yes | Machine Type AWS EC2 g4dn.2xlarge GPU Tesla T4 CPU Intel Xeon 2.5GHz |
| Software Dependencies | Yes | CUDA version 11.0 NVIDIA Driver 450.142.00 PyTorch version 1.12.1 Gym version 19.0 Python version 3.8.13 NumPy version 1.21.5 D4RL datasets version v2 |
| Experiment Setup | Yes | We list all hyperparameter values and neural architecture details in Table 2. For SAC-based implementation, we follow CDC’s [8] hyper-parameter values. The only exception is that we use 2 Q functions instead of 4, and we sample 5 actions from actor (m) instead of 15. |