Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Authors: Lin Yang, Mengdi Wang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we propose an online RL algorithm, namely the Matrix RL, that leverages ideas from linear bandit to learn a low-dimensional representation of the probability transition model while carefully balancing the exploitation-exploration tradeoff. We show that Matrix RL achieves a regret bound O H2d log T T where d is the number of features, independent with the number of state-action pairs. Matrix RL has an equivalent kernelized version, which is able to work with an arbitrary kernel Hilbert space without using explicit features. In this case, the kernelized Matrix RL satisfies a regret bound O H2 ed log T T , where ed is the effective dimension of the kernel space. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, University of California, Los Angeles 2Department of Electrical Engineering and Center for Statistics and Machine Learning, Princeton University. |
| Pseudocode | Yes | Algorithm 1 Upper Confidence Matrix Reinforcement Learning (UC-Matrix RL) and Algorithm 2 Kernel Matrix RL: Reinforcement Learning with Kernels |
| Open Source Code | No | The paper does not provide an explicit statement or link for the availability of its source code. |
| Open Datasets | No | The paper is theoretical and does not mention using any specific publicly available datasets for training, nor does it provide access information for any dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe training, validation, or test dataset splits for experimental reproduction. |
| Hardware Specification | No | The paper is theoretical and does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate experiments. |
| Experiment Setup | No | The paper does not provide specific experimental setup details such as concrete hyperparameter values, training configurations, or system-level settings, as it is a theoretical work. |