Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data

Authors: Kishan Panaganti, Adam Wierman, Eric Mazumdar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper presents work that aims to advance the field of Robust Reinforcement Learning for learning robust policies against model parameter mismatches. This work is of a rigorous theoretical nature; hence, the potential societal consequences of our work do not exist, or none of which we feel must be specifically highlighted here.
Researcher Affiliation Academia Kishan Panaganti 1 Adam Wierman 1 Eric Mazumdar 1 1Computing + Mathematical Sciences Department, California Institute of Technology. Correspondence to: Kishan Panaganti <kpb@caltech.edu>.
Pseudocode Yes Algorithm 1 Robust φ-regularized fitted Q-iteration (RPQ) Algorithm
Open Source Code No The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository for the described methodology.
Open Datasets No The paper discusses concepts such as "offline dataset DP o" and "adaptive datasets" collected on a "nominal model P o" or by a "data distribution µ". However, it does not name or provide access information (link, DOI, specific citation with author/year) for any publicly available, identifiable dataset used for training or evaluation.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments. Therefore, it does not specify training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and focuses on algorithm design and theoretical analysis. It does not report on computational experiments and therefore does not provide hardware specifications.
Software Dependencies No The paper is theoretical and focuses on algorithm design and theoretical analysis. It does not report on computational experiments and therefore does not provide specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and describes algorithms and their theoretical properties. It does not detail an empirical "experimental setup" with specific hyperparameters or system-level training settings for a practical implementation.