reproducibilityindex.ai

Model-Free Active Exploration in Reinforcement Learning

Authors: Alessio Russo, Alexandre Proutiere

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical results demonstrate that our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches. ... Numerical results on hard-exploration problems highlighted the effectiveness of our approach for learning efficient policies over state-of-the-art methods.
Researcher Affiliation	Academia	Alessio Russo Division of Decision and Control Systems KTH Royal Institute of Technology Stockholm, SE Alexandre Proutiere Division of Decision and Control Systems KTH Royal Institute of Technology Stockholm, SE
Pseudocode	Yes	Algorithm 1 Boostrapped MF-BPI (Boostrapped Model Free Best Policy Identification) ... Algorithm 2 DBMF-BPI (Deep Bootstrapped Model Free BPI)
Open Source Code	Yes	Code repository: https://github.com/rssalessio/Model Free Active Exploration RL
Open Datasets	Yes	For the tabular MDPs, we test the performance of MF-BPI on the Riverswim and the Forked Riverswim environments... For continuous state-spaces, we compare our algorithm to IDS[33] and BSP [39]... and assess their performance on hard-exploration problems from the Deep Mind BSuite [41] (the Deep Sea and the Cartpole swingup problems).
Dataset Splits	No	The paper does not explicitly provide training, validation, and test dataset splits with specific percentages or sample counts. It refers to standard environments and evaluation after T steps or episodes, which are typical for RL, but not in the conventional supervised learning dataset split sense.
Hardware Specification	No	The paper does not provide specific details about the hardware used for the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software or libraries used in the experiments.
Experiment Setup	Yes	Require: Parameters (λ, k, p); ensemble size B; learning rates {(αt, βt)}t. ... Details of the experiments, including the initialization of the algorithms, are provided in Appendix A.