reproducibilityindex.ai

Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation

Authors: Shreyas Chaudhari, Ameet Deshpande, Bruno C. da Silva, Philip S. Thomas

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we introduce STAR, a framework for OPE that encompasses a broad range of estimators which include existing OPE methods as special cases that achieve lower mean squared prediction errors. The best STAR estimator outperforms baselines in all twelve cases studied, and even the median STAR estimator surpasses the baselines in seven out of the twelve cases.
Researcher Affiliation	Academia	Shreyas Chaudhari University of Massachusetts schaudhari@cs.umass.edu Ameet Deshpande Princeton University asd@cs.princeton.edu Bruno Castro da Silva University of Massachusetts bsilva@cs.umass.edu Philip S. Thomas University of Massachusetts pthomas@cs.umass.edu
Pseudocode	Yes	Algorithm 1 Overview of STAR(ϕ, c)
Open Source Code	Yes	The code is available at: https://github.com/shreyasc-13/STAR. Anonymized code is submitted as a .zip file with the submission. The codebase will be made public upon acceptance.
Open Datasets	Yes	ICU-Sepsis is built from real-world medical records obtained from the MIMIC-III dataset [24]. MIMIC-III, a freely accessible critical care database. Scientific Data, 3(1):1 9, 2016.
Dataset Splits	No	Estimator selection presents a significant challenge for OPE [48] due to the unavailability of a validation set.
Hardware Specification	Yes	The experiments were run using 32 threads on Xeon E5-2680 CPUs on a computing cluster, bringing the total compute time to roughly 45000 compute hours.
Software Dependencies	No	The paper does not specify software dependencies with version numbers. It mentions "Open AI Gym" but no version, and "Min Atar testbed" without specific software versions for replication.
Experiment Setup	Yes	For the class of abstraction function, we observe that the simple method Clu STAR performs well across all domains, and hence we use it for all experiments. Clu STARtakes an input a single hyperparameter, the number of centroids initialized, denoted by \|Z\|. We evaluate the following configurations of Z and c for each domain: 1. Cart Pole: 35 estimators \|Z\| {2, 4, 8, 16, 32, 64, 128}, c {1, 2, 3, 4, 5}. 2. ICU-Sepsis: 25 estimators \|Z\| {2, 4, 8, 16, 32}, c {1, 2, 3, 4, 5}. 3. Asterix: 25 estimators \|Z\| {2, 4, 8, 16, 32}, c {1, 2, 3, 4, 5}.