Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Exploration in Resource-Restricted Reinforcement Learning
Authors: Zhihai Wang, Taoxing Pan, Qi Zhou, Jie Wang
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that the proposed RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments, improving the sample efficiency by up to an order of magnitude. |
| Researcher Affiliation | Academia | 1CAS Key Laboratory of Technology in GIPAS, University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center |
| Pseudocode | No | Due to limited space, we summarize the procedure of RAEB in Appendix B. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | To compare RAEB with the baselines, we design a range of robotic delivery and autonomous electric robot tasks based on Gym (Brockman et al. 2016) and Mujoco (Todorov, Erez, and Tassa 2012). |
| Dataset Splits | No | The paper mentions evaluating policies every 10000 training steps but does not specify details about dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions Gym and Mujoco as platforms but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For all environments, we use the intrinsic reward coefficient β = 0.25. We use α = 0.25Imax for delivery tasks, α = 2.5Imax for tasks with limited electricity, and α = [0.25Imax, 2.5Imax] for delivery tasks with limited electricity. |