Smoothing Advantage Learning
Authors: Yaozhong Gan, Zhe Zhang, Xiaoyang Tan6657-6664
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present our experimental results conducted over six games (Lunarlander; Asterix, Breakout, Space invaders, Seaquest, Freeway) from Gym (Brockman et al. 2016) and Min Atar (Young and Tian 2019). In addition, we also run some experiments on Atari games in Appendix. |
| Researcher Affiliation | Academia | College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics |
| Pseudocode | Yes | Algorithm 1 gives the detailed implementation pipeline in Appendix. |
| Open Source Code | No | The paper does not provide a link to open-source code for the methodology described, nor does it explicitly state that the code is made publicly available. |
| Open Datasets | Yes | In this section, we present our experimental results conducted over six games (Lunarlander; Asterix, Breakout, Space invaders, Seaquest, Freeway) from Gym (Brockman et al. 2016) and Min Atar (Young and Tian 2019). |
| Dataset Splits | No | The paper mentions test procedures but does not specify training, validation, or test dataset splits (e.g., percentages or sample counts) for the environments used. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Particularly, we choose α from the set of {0.2, 0.3, 0.5, 0.9} for AL (Bellemare et al. 2016). For SAL, we choose ω and α among {0.2, 0.3, 0.5, 0.9}, but the hyperparameters satisfy α < ω. For Munchausen-DQN (M-DQN) (Vieillard, Pietquin, and Geist 2020), we fix τ = 0.03, choose α from the set of {0.2, 0.3, 0.5, 0.9}. |