Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

Authors: Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also run numerical simulations to empirically compare SSR and RLSVI in the deep-sea environment, which is commonly used as a benchmark to test an algorithm s ability to explore. The results show that SSR significantly outperforms RLSVI as predicted by our regret analysis. More details about our experiment can be found in Appendix J.
Researcher Affiliation Academia 1 Paul G. Allen School of Computer Science & Engineering, University of Washington 2 Department of Electrical & Computer Engineering, University of Washington
Pseudocode Yes Algorithm 1: Single Seed Randomization (SSR)
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets No The paper mentions using the "deep-sea environment" for numerical simulations, but it does not provide concrete access information (e.g., a link, DOI, or specific citation with authors/year) for a publicly available dataset used for training.
Dataset Splits No The paper does not specify exact train/validation/test dataset splits. It mentions theoretical bounds and an empirical comparison in an environment, but no data partitioning details.
Hardware Specification No Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper does not provide specific hyperparameters or system-level training settings in the main text. It mentions "More details about our experiment can be found in Appendix J." but Appendix J is not provided in the given text.