Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Authors: Daniel Russo

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper develops a very different proof strategy and provides a worst-case regret bound for RLSVI applied to tabular finite-horizon MDPs.
Researcher Affiliation Academia Daniel Russo Columbia University djr2174@gsb.columbia.edu
Pseudocode Yes Algorithm 1: RLSVI for Tabular, Finite Horizon, MDPs
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets No The paper is theoretical and focuses on tabular finite-horizon MDPs as a problem formulation, rather than using specific publicly available datasets for empirical training.
Dataset Splits No The paper is theoretical and does not describe experiments with dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any experiments that would require specific hardware, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe software implementations with specific versioned dependencies.
Experiment Setup No The paper is theoretical and focuses on algorithmic analysis and proofs, therefore it does not describe an experimental setup with hyperparameters or training configurations.