reproducibilityindex.ai

Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data

Authors: Kishan Panaganti, Adam Wierman, Eric Mazumdar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper presents work that aims to advance the field of Robust Reinforcement Learning for learning robust policies against model parameter mismatches. This work is of a rigorous theoretical nature; hence, the potential societal consequences of our work do not exist, or none of which we feel must be specifically highlighted here.
Researcher Affiliation	Academia	Kishan Panaganti 1 Adam Wierman 1 Eric Mazumdar 1 1Computing + Mathematical Sciences Department, California Institute of Technology. Correspondence to: Kishan Panaganti <kpb@caltech.edu>.
Pseudocode	Yes	Algorithm 1 Robust φ-regularized fitted Q-iteration (RPQ) Algorithm
Open Source Code	No	The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository for the described methodology.
Open Datasets	No	The paper discusses concepts such as "offline dataset DP o" and "adaptive datasets" collected on a "nominal model P o" or by a "data distribution µ". However, it does not name or provide access information (link, DOI, specific citation with author/year) for any publicly available, identifiable dataset used for training or evaluation.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments. Therefore, it does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and theoretical analysis. It does not report on computational experiments and therefore does not provide hardware specifications.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and theoretical analysis. It does not report on computational experiments and therefore does not provide specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and describes algorithms and their theoretical properties. It does not detail an empirical "experimental setup" with specific hyperparameters or system-level training settings for a practical implementation.