Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Near-Optimal Representation Learning for Linear Bandits and Linear RL
Authors: Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k d and k M. We propose a sample-efficient algorithm, MTLROFUL, which leverages the shared representation to achieve O(M k MT) regret, with T being the number of total steps. Our regret significantly improves upon the baseline O(Md T) achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when d > M. Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error (Zanette et al., 2020a). |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 2Department of Electrical and Computer Engineering, Princeton University 3Amazon 4Center for Data Science, Peking University. |
| Pseudocode | Yes | Algorithm 1 Multi-Task Low-Rank OFUL |
| Open Source Code | No | First, our algorithms are statistically sample-efficient, but a computationally efficient implementation is still unknown, although we conjecture our MTLR-OFUL algorithm is computationally efficient. How to design both computationally and statistically efficient algorithms in our multi-task setting is an interesting problem for future research. |
| Open Datasets | No | The paper describes theoretical models for multi-task linear bandits and multi-task episodic RL without specifying concrete, publicly available datasets for training or evaluation. |
| Dataset Splits | No | The paper focuses on theoretical analysis and algorithm design, and does not provide specific training/test/validation dataset splits. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design, and does not specify any hardware used for experiments. |
| Software Dependencies | No | The paper presents theoretical algorithms and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details on experimental setup such as hyperparameters or training configurations. |