reproducibilityindex.ai

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

Authors: Lin Yang, Mengdi Wang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we propose an online RL algorithm, namely the Matrix RL, that leverages ideas from linear bandit to learn a low-dimensional representation of the probability transition model while carefully balancing the exploitation-exploration tradeoff. We show that Matrix RL achieves a regret bound O H2d log T T where d is the number of features, independent with the number of state-action pairs. Matrix RL has an equivalent kernelized version, which is able to work with an arbitrary kernel Hilbert space without using explicit features. In this case, the kernelized Matrix RL satisﬁes a regret bound O H2 ed log T T , where ed is the effective dimension of the kernel space.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, University of California, Los Angeles 2Department of Electrical Engineering and Center for Statistics and Machine Learning, Princeton University.
Pseudocode	Yes	Algorithm 1 Upper Conﬁdence Matrix Reinforcement Learning (UC-Matrix RL) and Algorithm 2 Kernel Matrix RL: Reinforcement Learning with Kernels
Open Source Code	No	The paper does not provide an explicit statement or link for the availability of its source code.
Open Datasets	No	The paper is theoretical and does not mention using any specific publicly available datasets for training, nor does it provide access information for any dataset.
Dataset Splits	No	The paper is theoretical and does not describe training, validation, or test dataset splits for experimental reproduction.
Hardware Specification	No	The paper is theoretical and does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies	No	The paper is theoretical and does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate experiments.
Experiment Setup	No	The paper does not provide specific experimental setup details such as concrete hyperparameter values, training configurations, or system-level settings, as it is a theoretical work.