Smoothing Advantage Learning

Authors: Yaozhong Gan, Zhe Zhang, Xiaoyang Tan6657-6664

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present our experimental results conducted over six games (Lunarlander; Asterix, Breakout, Space invaders, Seaquest, Freeway) from Gym (Brockman et al. 2016) and Min Atar (Young and Tian 2019). In addition, we also run some experiments on Atari games in Appendix.
Researcher Affiliation Academia College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
Pseudocode Yes Algorithm 1 gives the detailed implementation pipeline in Appendix.
Open Source Code No The paper does not provide a link to open-source code for the methodology described, nor does it explicitly state that the code is made publicly available.
Open Datasets Yes In this section, we present our experimental results conducted over six games (Lunarlander; Asterix, Breakout, Space invaders, Seaquest, Freeway) from Gym (Brockman et al. 2016) and Min Atar (Young and Tian 2019).
Dataset Splits No The paper mentions test procedures but does not specify training, validation, or test dataset splits (e.g., percentages or sample counts) for the environments used.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Particularly, we choose α from the set of {0.2, 0.3, 0.5, 0.9} for AL (Bellemare et al. 2016). For SAL, we choose ω and α among {0.2, 0.3, 0.5, 0.9}, but the hyperparameters satisfy α < ω. For Munchausen-DQN (M-DQN) (Vieillard, Pietquin, and Geist 2020), we fix τ = 0.03, choose α from the set of {0.2, 0.3, 0.5, 0.9}.