Controlling Underestimation Bias in Reinforcement Learning via Quasi-median Operation
Authors: Wei Wei, Yujia Zhang, Jiye Liang, Lin Li, Yyuze Li8621-8628
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the discrete and continuous action tasks, and results show that our method outperforms the state-of-the-art methods. In this section, we empirically evaluate our method in the discrete and continuous action environments. |
| Researcher Affiliation | Academia | Wei Wei, Yujia Zhang, Jiye Liang*, Lin Li, Yuze Li School of Computer and Information Technology, Shanxi University, Taiyuan 030006, P.R. China weiwei@sxu.edu.cn, 342564535@qq.com, ljy@sxu.edu.cn, lilynn1116@sxu.edu.cn, 202022407033@email.sxu.edu.cn |
| Pseudocode | Yes | Algorithm 1: QMQ algorithm; Algorithm 2: QMD3 algorithm |
| Open Source Code | No | The paper does not provide any explicit statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | For the discrete action environments, we choose 6 games from Gym (Brockman et al. 2016), PLE (Tasfi 2016), and Min Atar (Young and Tian 2019): Lunarlander-v2, Catcherv0, Pixelcopter-v0, Asterix-v0, Breakout-v0, and Space Invaders-v0 to evaluate QMQ. ... For the continuous action environments, we compare the proposed QMD3... on 8 Mu Jo Co tasks (Todorov, Erez, and Tassa 2012): Inverted Pendulum-v2 (IP), Inverted Double Pendulum-v2 (IDP), Reacher-v2, Hopperv3, Half Cheetah-v3, Walker2d-v3, Ant-v3, and Humanoidv3 |
| Dataset Splits | No | The paper describes training processes within environments and evaluations of performance, but does not specify 'training/test/validation dataset splits' with percentages or sample counts as typically understood in supervised learning. It mentions 'validation' in contexts like convergence proof or improving exploration, but not for dataset partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions environments like OpenAI Gym, PLE, MinAtar, and MuJoCo, but does not provide specific version numbers for software dependencies or libraries used for implementation. |
| Experiment Setup | No | The paper states that 'More detailed information about the rendering of the environment, hyper-parameters, and implementation details can be found in Appendix D.A and Appendix E.' and 'For more details about the rendering of the environment, hyper-parameters, and implementation details, please refer to Appendix D.B and Appendix E.', indicating that these specific details are not present in the main text. |