Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Incremental Reinforcement Learning with Dual-Adaptive ε-Greedy Exploration

Authors: Wei Ding, Siyang Jiang, Hsi-Wen Chen, Ming-Syan Chen

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed framework can efficiently learn the unseen transitions in new environments, leading to notable performance improvement, i.e., an average of more than 80%, over eight baselines examined.
Researcher Affiliation Academia Graduate Institute of Electrical Engineering, National Taiwan University, Taiwan EMAIL, EMAIL
Pseudocode Yes The detailed pseudo codes are presented in Algorithm 1.
Open Source Code Yes https://github.com/weiding98/DAE
Open Datasets Yes Furthermore, we release a new testbed based on an exponential-growing environment and the Atari benchmark (Mnih et al. 2013) to evaluate the efficiency of any algorithms under Incremental RL, including the one we proposed, DAE.
Dataset Splits No The paper describes evaluation frequency during training ('The agent will be trained for 200K steps in total and be tested every 2K steps.' and 'Evaluation will be conducted after every 1M training steps'), but it does not specify fixed training/validation/test dataset splits with percentages or sample counts, which is common in supervised learning but less so in reinforcement learning with dynamic environments.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes The length of the course N is set to 10 and the environment resets every 500 steps or if the max total reward is reached. The agent will be trained for 200K steps in total and be tested every 2K steps.