reproducibilityindex.ai

Near-Optimal Representation Learning for Linear Bandits and Linear RL

Authors: Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We ﬁrst consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k d and k M. We propose a sample-efﬁcient algorithm, MTLROFUL, which leverages the shared representation to achieve O(M k MT) regret, with T being the number of total steps. Our regret signiﬁcantly improves upon the baseline O(Md T) achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when d > M. Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error (Zanette et al., 2020a).
Researcher Affiliation	Collaboration	1Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 2Department of Electrical and Computer Engineering, Princeton University 3Amazon 4Center for Data Science, Peking University.
Pseudocode	Yes	Algorithm 1 Multi-Task Low-Rank OFUL
Open Source Code	No	First, our algorithms are statistically sample-efﬁcient, but a computationally efﬁcient implementation is still unknown, although we conjecture our MTLR-OFUL algorithm is computationally efﬁcient. How to design both computationally and statistically efﬁcient algorithms in our multi-task setting is an interesting problem for future research.
Open Datasets	No	The paper describes theoretical models for multi-task linear bandits and multi-task episodic RL without specifying concrete, publicly available datasets for training or evaluation.
Dataset Splits	No	The paper focuses on theoretical analysis and algorithm design, and does not provide specific training/test/validation dataset splits.
Hardware Specification	No	The paper focuses on theoretical analysis and algorithm design, and does not specify any hardware used for experiments.
Software Dependencies	No	The paper presents theoretical algorithms and does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not include details on experimental setup such as hyperparameters or training configurations.