reproducibilityindex.ai

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

Authors: Priyank Agrawal, Jinglin Chen, Nan Jiang6566-6573

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper studies regret minimization with randomized value functions in reinforcement learning. In tabular ﬁnite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). Our O(H2S AT) high-probability worst-case regret bound improves the previous sharpest worst-case regret bounds for RLSVI and matches the existing state-of-the-art worst-case TS-based regret bounds.
Researcher Affiliation	Academia	Priyank Agrawal, Jinglin Chen, Nan Jiang University of Illinois at Urbana-Champaign, Urbana, IL, 61801 priyank4@illinois.edu, jinglinc@illinois.edu, nanjiang@illinois.edu
Pseudocode	Yes	Algorithm 1 C-RLSVI
Open Source Code	No	The paper is theoretical and does not mention releasing any source code for the methodology described.
Open Datasets	No	The paper is purely theoretical and does not involve the use of datasets for training.
Dataset Splits	No	The paper is purely theoretical and does not involve dataset splits for validation.
Hardware Specification	No	The paper is purely theoretical and does not describe any hardware used for experiments.
Software Dependencies	No	The paper is purely theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is purely theoretical and does not provide details about experimental setup, hyperparameters, or training settings.