Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Transfer Q-Learning with Composite MDP Structures
Authors: Jinhang Chai, Elynn Chen, Lin Yang
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations. This work represents a significant step toward bridging the gap between empirical success of transfer RL and theoretical understanding by providing a rigorous analysis of how structural similarities in transition dynamics enable efficient knowledge transfer. |
| Researcher Affiliation | Academia | 1Department of Operations Research and Financial Engineering, Princeton University 2Department of Technology, Operations, and Statistics, New York University 3Department of Electrical and Computer Engineering, UCLA. |
| Pseudocode | Yes | Algorithm 1 UCB-Q Learning for HD Composite MDPs Algorithm 2 UCB-TQL for Composite MDPs |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, a link to a code repository, or mention that code is available in supplementary materials. |
| Open Datasets | No | The paper is theoretical in nature, focusing on a composite MDP framework and algorithms. It does not conduct empirical studies using specific datasets, therefore, no information about publicly available or open datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not perform experiments with specific datasets. Consequently, there is no mention of training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and regret analysis rather than empirical experimentation. Therefore, no specific hardware used for running experiments is mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe the implementation of its algorithms or any experimental setup. Therefore, no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical, presenting a new MDP framework and associated algorithms with provable guarantees. It does not include any experimental validation, and as such, no specific experimental setup details, hyperparameters, or system-level training settings are provided. |