Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

Authors: Priyank Agrawal, Jinglin Chen, Nan Jiang6566-6573

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). Our O(H2S AT) high-probability worst-case regret bound improves the previous sharpest worst-case regret bounds for RLSVI and matches the existing state-of-the-art worst-case TS-based regret bounds.
Researcher Affiliation Academia Priyank Agrawal*, Jinglin Chen*, Nan Jiang University of Illinois at Urbana-Champaign, Urbana, IL, 61801 priyank4@illinois.edu, jinglinc@illinois.edu, nanjiang@illinois.edu
Pseudocode Yes Algorithm 1 C-RLSVI
Open Source Code No The paper is theoretical and does not mention releasing any source code for the methodology described.
Open Datasets No The paper is purely theoretical and does not involve the use of datasets for training.
Dataset Splits No The paper is purely theoretical and does not involve dataset splits for validation.
Hardware Specification No The paper is purely theoretical and does not describe any hardware used for experiments.
Software Dependencies No The paper is purely theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup No The paper is purely theoretical and does not provide details about experimental setup, hyperparameters, or training settings.