Incremental Reinforcement Learning with Dual-Adaptive ε-Greedy Exploration

Authors: Wei Ding, Siyang Jiang, Hsi-Wen Chen, Ming-Syan Chen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed framework can efficiently learn the unseen transitions in new environments, leading to notable performance improvement, i.e., an average of more than 80%, over eight baselines examined.
Researcher Affiliation Academia Graduate Institute of Electrical Engineering, National Taiwan University, Taiwan {wding, syjiang, hwchen}@arbor.ee.ntu.edu.tw, mschen@ntu.edu.tw
Pseudocode Yes The detailed pseudo codes are presented in Algorithm 1.
Open Source Code Yes https://github.com/weiding98/DAE
Open Datasets Yes Furthermore, we release a new testbed based on an exponential-growing environment and the Atari benchmark (Mnih et al. 2013) to evaluate the efficiency of any algorithms under Incremental RL, including the one we proposed, DAE.
Dataset Splits No The paper describes evaluation frequency during training ('The agent will be trained for 200K steps in total and be tested every 2K steps.' and 'Evaluation will be conducted after every 1M training steps'), but it does not specify fixed training/validation/test dataset splits with percentages or sample counts, which is common in supervised learning but less so in reinforcement learning with dynamic environments.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes The length of the course N is set to 10 and the environment resets every 500 steps or if the max total reward is reached. The agent will be trained for 200K steps in total and be tested every 2K steps.