Potential Driven Reinforcement Learning for Hard Exploration Tasks
Authors: Enmin Zhao, Shihong Deng, Yifan Zang, Yongxin Kang, Kai Li, Junliang Xing
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental analyses and comparisons on multiple challenging hard exploration environments have verified its effectiveness and efficiency. and To verify the efficiency of the Pot ER sampling algorithm, we evaluate its performance on two separate domains: 1) a simple maze with discrete state space (Fig. 3 left) and 2) the hard exploration Atari games with continuous state space. |
| Researcher Affiliation | Academia | 1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences {zhaoenmin2018, shihong.deng, zangyifan2019, kangyongxin2018, kai.li, junliang.xing}@ia.ac.cn |
| Pseudocode | Yes | Algorithm 1: Pot ER based RL with SIL. |
| Open Source Code | Yes | The source code of this work is available at https: //github.com/Zhao En Min/Pot ER. |
| Open Datasets | No | The paper mentions standard Atari games like 'Montezuma s Revenge', 'Freeway', 'Gravitar', and 'Private Eye' as experimental environments, but it does not provide concrete access information (specific links, DOIs, repositories, or formal citations) for datasets (e.g., ROMs or specific data files) or the custom maze environment used. |
| Dataset Splits | No | The paper specifies the number of random seeds used for experiments (e.g., '3 random seeds' for maze, '5 random seeds in 50M time steps' for Atari) but does not provide explicit train/validation/test dataset splits, percentages, or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or other detailed computer specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, etc.) used in the experiments. |
| Experiment Setup | Yes | The main hyper-parameters of our algorithm are the number of iterations used to set goals N, the influence distance of the repulsive potential field do, the attractive parameter ka and the repulsive parameter kr. Because in different games, agents have different average steps to lose health, we set N as 50... we set kr to and ka to any positive value... In specific, for the maze games, we set do to 1. For the Atari games, we set do to 10. and In the Atari experiments, we convert the 84 84 input RGB frames to gray-scale images. The input of the convolutional neural networks... are the last 4 stacked gray-scale frames. For the SIL and SIL+Pot ER algorithms, we perform four SIL updates in training each model. |