Model-Free Active Exploration in Reinforcement Learning

Authors: Alessio Russo, Alexandre Proutiere

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical results demonstrate that our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches. ... Numerical results on hard-exploration problems highlighted the effectiveness of our approach for learning efficient policies over state-of-the-art methods.
Researcher Affiliation Academia Alessio Russo Division of Decision and Control Systems KTH Royal Institute of Technology Stockholm, SE Alexandre Proutiere Division of Decision and Control Systems KTH Royal Institute of Technology Stockholm, SE
Pseudocode Yes Algorithm 1 Boostrapped MF-BPI (Boostrapped Model Free Best Policy Identification) ... Algorithm 2 DBMF-BPI (Deep Bootstrapped Model Free BPI)
Open Source Code Yes Code repository: https://github.com/rssalessio/Model Free Active Exploration RL
Open Datasets Yes For the tabular MDPs, we test the performance of MF-BPI on the Riverswim and the Forked Riverswim environments... For continuous state-spaces, we compare our algorithm to IDS[33] and BSP [39]... and assess their performance on hard-exploration problems from the Deep Mind BSuite [41] (the Deep Sea and the Cartpole swingup problems).
Dataset Splits No The paper does not explicitly provide training, validation, and test dataset splits with specific percentages or sample counts. It refers to standard environments and evaluation after T steps or episodes, which are typical for RL, but not in the conventional supervised learning dataset split sense.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies No The paper does not provide specific version numbers for ancillary software or libraries used in the experiments.
Experiment Setup Yes Require: Parameters (λ, k, p); ensemble size B; learning rates {(αt, βt)}t. ... Details of the experiments, including the initialization of the algorithms, are provided in Appendix A.