Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration
Authors: Priyank Agrawal, Jinglin Chen, Nan Jiang6566-6573
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). Our O(H2S AT) high-probability worst-case regret bound improves the previous sharpest worst-case regret bounds for RLSVI and matches the existing state-of-the-art worst-case TS-based regret bounds. |
| Researcher Affiliation | Academia | Priyank Agrawal*, Jinglin Chen*, Nan Jiang University of Illinois at Urbana-Champaign, Urbana, IL, 61801 priyank4@illinois.edu, jinglinc@illinois.edu, nanjiang@illinois.edu |
| Pseudocode | Yes | Algorithm 1 C-RLSVI |
| Open Source Code | No | The paper is theoretical and does not mention releasing any source code for the methodology described. |
| Open Datasets | No | The paper is purely theoretical and does not involve the use of datasets for training. |
| Dataset Splits | No | The paper is purely theoretical and does not involve dataset splits for validation. |
| Hardware Specification | No | The paper is purely theoretical and does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is purely theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is purely theoretical and does not provide details about experimental setup, hyperparameters, or training settings. |