Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Authors: Daniel Russo
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper develops a very different proof strategy and provides a worst-case regret bound for RLSVI applied to tabular finite-horizon MDPs. |
| Researcher Affiliation | Academia | Daniel Russo Columbia University djr2174@gsb.columbia.edu |
| Pseudocode | Yes | Algorithm 1: RLSVI for Tabular, Finite Horizon, MDPs |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on tabular finite-horizon MDPs as a problem formulation, rather than using specific publicly available datasets for empirical training. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require specific hardware, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe software implementations with specific versioned dependencies. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithmic analysis and proofs, therefore it does not describe an experimental setup with hyperparameters or training configurations. |