Model-Free Active Exploration in Reinforcement Learning
Authors: Alessio Russo, Alexandre Proutiere
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results demonstrate that our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches. ... Numerical results on hard-exploration problems highlighted the effectiveness of our approach for learning efficient policies over state-of-the-art methods. |
| Researcher Affiliation | Academia | Alessio Russo Division of Decision and Control Systems KTH Royal Institute of Technology Stockholm, SE Alexandre Proutiere Division of Decision and Control Systems KTH Royal Institute of Technology Stockholm, SE |
| Pseudocode | Yes | Algorithm 1 Boostrapped MF-BPI (Boostrapped Model Free Best Policy Identification) ... Algorithm 2 DBMF-BPI (Deep Bootstrapped Model Free BPI) |
| Open Source Code | Yes | Code repository: https://github.com/rssalessio/Model Free Active Exploration RL |
| Open Datasets | Yes | For the tabular MDPs, we test the performance of MF-BPI on the Riverswim and the Forked Riverswim environments... For continuous state-spaces, we compare our algorithm to IDS[33] and BSP [39]... and assess their performance on hard-exploration problems from the Deep Mind BSuite [41] (the Deep Sea and the Cartpole swingup problems). |
| Dataset Splits | No | The paper does not explicitly provide training, validation, and test dataset splits with specific percentages or sample counts. It refers to standard environments and evaluation after T steps or episodes, which are typical for RL, but not in the conventional supervised learning dataset split sense. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments (e.g., CPU, GPU models, or memory specifications). |
| Software Dependencies | No | The paper does not provide specific version numbers for ancillary software or libraries used in the experiments. |
| Experiment Setup | Yes | Require: Parameters (λ, k, p); ensemble size B; learning rates {(αt, βt)}t. ... Details of the experiments, including the initialization of the algorithms, are provided in Appendix A. |