Near-Optimal Representation Learning for Linear Bandits and Linear RL

Authors: Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k d and k M. We propose a sample-efficient algorithm, MTLROFUL, which leverages the shared representation to achieve O(M k MT) regret, with T being the number of total steps. Our regret significantly improves upon the baseline O(Md T) achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when d > M. Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error (Zanette et al., 2020a).
Researcher Affiliation Collaboration 1Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 2Department of Electrical and Computer Engineering, Princeton University 3Amazon 4Center for Data Science, Peking University.
Pseudocode Yes Algorithm 1 Multi-Task Low-Rank OFUL
Open Source Code No First, our algorithms are statistically sample-efficient, but a computationally efficient implementation is still unknown, although we conjecture our MTLR-OFUL algorithm is computationally efficient. How to design both computationally and statistically efficient algorithms in our multi-task setting is an interesting problem for future research.
Open Datasets No The paper describes theoretical models for multi-task linear bandits and multi-task episodic RL without specifying concrete, publicly available datasets for training or evaluation.
Dataset Splits No The paper focuses on theoretical analysis and algorithm design, and does not provide specific training/test/validation dataset splits.
Hardware Specification No The paper focuses on theoretical analysis and algorithm design, and does not specify any hardware used for experiments.
Software Dependencies No The paper presents theoretical algorithms and does not specify any software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not include details on experimental setup such as hyperparameters or training configurations.