Near-Optimal Representation Learning for Linear Bandits and Linear RL
Authors: Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k d and k M. We propose a sample-efficient algorithm, MTLROFUL, which leverages the shared representation to achieve O(M k MT) regret, with T being the number of total steps. Our regret significantly improves upon the baseline O(Md T) achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when d > M. Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error (Zanette et al., 2020a). |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 2Department of Electrical and Computer Engineering, Princeton University 3Amazon 4Center for Data Science, Peking University. |
| Pseudocode | Yes | Algorithm 1 Multi-Task Low-Rank OFUL |
| Open Source Code | No | First, our algorithms are statistically sample-efficient, but a computationally efficient implementation is still unknown, although we conjecture our MTLR-OFUL algorithm is computationally efficient. How to design both computationally and statistically efficient algorithms in our multi-task setting is an interesting problem for future research. |
| Open Datasets | No | The paper describes theoretical models for multi-task linear bandits and multi-task episodic RL without specifying concrete, publicly available datasets for training or evaluation. |
| Dataset Splits | No | The paper focuses on theoretical analysis and algorithm design, and does not provide specific training/test/validation dataset splits. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design, and does not specify any hardware used for experiments. |
| Software Dependencies | No | The paper presents theoretical algorithms and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details on experimental setup such as hyperparameters or training configurations. |