Potential Driven Reinforcement Learning for Hard Exploration Tasks

Authors: Enmin Zhao, Shihong Deng, Yifan Zang, Yongxin Kang, Kai Li, Junliang Xing

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental analyses and comparisons on multiple challenging hard exploration environments have verified its effectiveness and efficiency. and To verify the efficiency of the Pot ER sampling algorithm, we evaluate its performance on two separate domains: 1) a simple maze with discrete state space (Fig. 3 left) and 2) the hard exploration Atari games with continuous state space.
Researcher Affiliation Academia 1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences {zhaoenmin2018, shihong.deng, zangyifan2019, kangyongxin2018, kai.li, junliang.xing}@ia.ac.cn
Pseudocode Yes Algorithm 1: Pot ER based RL with SIL.
Open Source Code Yes The source code of this work is available at https: //github.com/Zhao En Min/Pot ER.
Open Datasets No The paper mentions standard Atari games like 'Montezuma s Revenge', 'Freeway', 'Gravitar', and 'Private Eye' as experimental environments, but it does not provide concrete access information (specific links, DOIs, repositories, or formal citations) for datasets (e.g., ROMs or specific data files) or the custom maze environment used.
Dataset Splits No The paper specifies the number of random seeds used for experiments (e.g., '3 random seeds' for maze, '5 random seeds in 50M time steps' for Atari) but does not provide explicit train/validation/test dataset splits, percentages, or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or other detailed computer specifications used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, etc.) used in the experiments.
Experiment Setup Yes The main hyper-parameters of our algorithm are the number of iterations used to set goals N, the influence distance of the repulsive potential field do, the attractive parameter ka and the repulsive parameter kr. Because in different games, agents have different average steps to lose health, we set N as 50... we set kr to and ka to any positive value... In specific, for the maze games, we set do to 1. For the Atari games, we set do to 10. and In the Atari experiments, we convert the 84 84 input RGB frames to gray-scale images. The input of the convolutional neural networks... are the last 4 stacked gray-scale frames. For the SIL and SIL+Pot ER algorithms, we perform four SIL updates in training each model.