Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data
Authors: Kishan Panaganti, Adam Wierman, Eric Mazumdar
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper presents work that aims to advance the field of Robust Reinforcement Learning for learning robust policies against model parameter mismatches. This work is of a rigorous theoretical nature; hence, the potential societal consequences of our work do not exist, or none of which we feel must be specifically highlighted here. |
| Researcher Affiliation | Academia | Kishan Panaganti 1 Adam Wierman 1 Eric Mazumdar 1 1Computing + Mathematical Sciences Department, California Institute of Technology. Correspondence to: Kishan Panaganti <kpb@caltech.edu>. |
| Pseudocode | Yes | Algorithm 1 Robust φ-regularized fitted Q-iteration (RPQ) Algorithm |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository for the described methodology. |
| Open Datasets | No | The paper discusses concepts such as "offline dataset DP o" and "adaptive datasets" collected on a "nominal model P o" or by a "data distribution µ". However, it does not name or provide access information (link, DOI, specific citation with author/year) for any publicly available, identifiable dataset used for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments. Therefore, it does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and theoretical analysis. It does not report on computational experiments and therefore does not provide hardware specifications. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and theoretical analysis. It does not report on computational experiments and therefore does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and describes algorithms and their theoretical properties. It does not detail an empirical "experimental setup" with specific hyperparameters or system-level training settings for a practical implementation. |