Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes
Authors: Ruiquan Huang, Yuan Cheng, Jing Yang, Vincent Tan, Yingbin Liang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our work is the first theoretical study that characterizes the benefits of multi-task RL with PSRs/POMDPs over its single-task counterpart. |
| Researcher Affiliation | Academia | Penn State University, State College, PA 16801, USA. {rzh514,yangjing}@psu.edu National University of Singapore, 119077, Singapore. yuan.cheng@u.nus.edu, vtan@nus.edu.sg Ohio State University, Columbus, OH 43210, USA. liang.889@osu.edu. |
| Pseudocode | Yes | Algorithm 1 Upstream Multi-Task PSRs (UMT-PSR) |
| Open Source Code | No | The paper does not provide an explicit statement of open-source code release, nor does it include any links to code repositories. |
| Open Datasets | No | This is a theoretical paper that does not perform empirical studies with specific datasets. While it discusses 'data collection' in the context of its proposed algorithm, it does not use or provide access information for a publicly available training dataset. |
| Dataset Splits | No | This is a theoretical paper and does not involve empirical experiments with dataset splits. Therefore, it does not provide specific training, validation, or test dataset split information. |
| Hardware Specification | No | This is a theoretical paper focusing on algorithm design and theoretical guarantees, thus it does not describe any specific hardware used for experiments. |
| Software Dependencies | No | This is a theoretical paper that proposes algorithms and provides theoretical analysis. It does not describe implementation details or specific software dependencies with version numbers. |
| Experiment Setup | No | This paper is theoretical and focuses on mathematical proofs and algorithm design. It does not describe an experimental setup with specific hyperparameters or training configurations. |