Explicit Planning for Efficient Exploration in Reinforcement Learning
Authors: Liangpeng Zhang, Ke Tang, Xin Yao
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The analysis not only points out the weakness of existing heuristic-based strategies, but also suggests a remarkable potential in explicit planning for exploration. |
| Researcher Affiliation | Academia | 1CERCIA, School of Computer Science, University of Birmingham, U.K. 2Shenzhen Key Laboratory of Computational Intelligence, University Key Laboratory of Evolving Intelligent Systems of Guangdong Province, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China |
| Pseudocode | Yes | Algorithm 1 Value Iteration for Exploration Cost (VIEC) Input: Demand matrix Din, transition P Output: Exploration scheme ψ |
| Open Source Code | No | The paper does not provide any concrete access information for source code. |
| Open Datasets | No | The paper describes theoretical constructs like 'tower MDPs' for analysis but does not use or provide access information for a publicly available or open dataset for empirical training. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) as it focuses on theoretical analysis rather than empirical experiments with datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. This is consistent with a theoretical paper. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. This is consistent with a theoretical paper that does not describe empirical experiments requiring specific software environments for replication. |
| Experiment Setup | No | The paper does not provide specific experimental setup details such as hyperparameter values or training configurations, as it focuses on theoretical analysis rather than empirical experiments. |