Efficient Exploration in Resource-Restricted Reinforcement Learning
Authors: Zhihai Wang, Taoxing Pan, Qi Zhou, Jie Wang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that the proposed RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments, improving the sample efficiency by up to an order of magnitude. |
| Researcher Affiliation | Academia | 1CAS Key Laboratory of Technology in GIPAS, University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center |
| Pseudocode | No | Due to limited space, we summarize the procedure of RAEB in Appendix B. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | To compare RAEB with the baselines, we design a range of robotic delivery and autonomous electric robot tasks based on Gym (Brockman et al. 2016) and Mujoco (Todorov, Erez, and Tassa 2012). |
| Dataset Splits | No | The paper mentions evaluating policies every 10000 training steps but does not specify details about dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions Gym and Mujoco as platforms but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For all environments, we use the intrinsic reward coefficient β = 0.25. We use α = 0.25Imax for delivery tasks, α = 2.5Imax for tasks with limited electricity, and α = [0.25Imax, 2.5Imax] for delivery tasks with limited electricity. |