Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
Authors: Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also run numerical simulations to empirically compare SSR and RLSVI in the deep-sea environment, which is commonly used as a benchmark to test an algorithm s ability to explore. The results show that SSR significantly outperforms RLSVI as predicted by our regret analysis. More details about our experiment can be found in Appendix J. |
| Researcher Affiliation | Academia | 1 Paul G. Allen School of Computer Science & Engineering, University of Washington 2 Department of Electrical & Computer Engineering, University of Washington |
| Pseudocode | Yes | Algorithm 1: Single Seed Randomization (SSR) |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | No | The paper mentions using the "deep-sea environment" for numerical simulations, but it does not provide concrete access information (e.g., a link, DOI, or specific citation with authors/year) for a publicly available dataset used for training. |
| Dataset Splits | No | The paper does not specify exact train/validation/test dataset splits. It mentions theoretical bounds and an empirical comparison in an environment, but no data partitioning details. |
| Hardware Specification | No | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper does not provide specific hyperparameters or system-level training settings in the main text. It mentions "More details about our experiment can be found in Appendix J." but Appendix J is not provided in the given text. |