reproducibilityindex.ai

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

Authors: Shuze Liu, Shangtong Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7. Empirical Results In this section, we present empirical results comparing our methods against three baselines:
Researcher Affiliation	Academia	1Department of Computer Science, University of Virginia. Correspondence to: Shuze Liu <shuzeliu@virginia.edu>.
Pseudocode	Yes	Algorithm 1 Offline Data Informed (ODI) algorithm
Open Source Code	Yes	Our implementation is made publicly available to facilitate future research1. 1https://github.com/Shuze Liu/Behavior-Policy-Design-for Policy-Evaluation
Open Datasets	No	The paper describes how the data was generated for Gridworld and Mu Jo Co environments but does not provide a link, DOI, specific repository name, or formal citation with authors and year for a publicly available dataset.
Dataset Splits	No	We split the offline data into a training set and a test set. We tune all hyperparameters offline based on the supervised learning loss and fitted Q-learning loss on the test set.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory specifications) for running experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer and the PPO algorithm with reference to other papers, but it does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks used for the implementation.
Experiment Setup	Yes	All hyperparameters of our methods required to learn ˆµ are tuned offline and are the same across all Mu Jo Co and Gridworld experiments. With the Adam optimizer (Kingma & Ba, 2015), we search the learning rates in 2^-20, 2^-18, ..., 2^0 to minimize the loss on the offline data and use the learning rate 2^-10 on all learning processes.