Distributional Reinforcement Learning for Multi-Dimensional Reward Functions
Authors: Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, Tie-Yan Liu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we provide empirical results to answer the following questions: On policy evaluation settings, can MD3QN accurately model the joint distribution of multiple sources of reward? On policy optimization settings, can MD3QN learn a better policy compared to HRA and distributional RL algorithm on environments with multiple sources of reward? |
| Researcher Affiliation | Collaboration | Pushi Zhang Tsinghua University zpschang@gmail.com Xiaoyu Chen Tsinghua University chen-xy21@mails.tsinghua.edu.cn Li Zhao Microsoft Research Asia lizo@microsoft.com Wei Xiong The Hong Kong University of Science and Technology wxiongae@connect.ust.hk Tao Qin Microsoft Research Asia taoqin@microsoft.com Tie-Yan Liu Microsoft Research Asia tyliu@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Gradient estimation of MMD2 loss by transition samples |
| Open Source Code | Yes | Our code for the experiments is available in the supplementary material. |
| Open Datasets | Yes | On Atari games from Arcade Learning Environment (Bellemare et al., 2013). We use six Atari games with multiple sources of rewards: Air Raid, Asteroids, Gopher, Ms Pacman, Up NDown, and Pong. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits, percentages, or counts. It only refers to 'training curve results' on Atari games. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as exact GPU/CPU models, processor types, or memory. |
| Software Dependencies | No | The paper mentions that 'Our implementation of MD3QN is built upon the Dopamine framework (Castro et al., 2018)', but it does not provide specific version numbers for Dopamine or any other software dependencies needed for reproducibility. |
| Experiment Setup | Yes | The hyper-parameter settings used by MD3QN and the environment settings are detailed in Appendix A.2. |